Showing posts with label debian. Show all posts
Showing posts with label debian. Show all posts

25 March 2008

dcmd now in devscripts

In case anyone out there is using my dcmd script, I'll note that it's now included in devscripts (starting with version 2.10.20). The devscripts version doesn't behave differently when called as dscp or drsync, I have the following aliases to get the old behavior:

alias dscp='dcmd scp'
alias drsync='dcmd rsync -av'
As always, feedback welcome.

23 March 2008

Cleaner

In the productive procrastination department: I repackaged rcs using debhelper (it feels like the future) and while I was at it, I implemented MadCoder's git maintenance scheme where changes from upstream are maintained in a rebased integration branch and serialized as patches in the Debian branch (master in my case) using git format-patch. It's pretty nice:

  • I can use native Git commands to edit/refresh patches, which beats quilt any day;
  • Changes to upstream are visible both as a git branch and as separated, commented patches in the Debian branch;
  • If necessary (e.g. in case of NMU), outside contributors can just dump a -p1 patch in debian/patches and be done with it.
Of course this is a very low-maintenance package (the last upstream release was 13 years ago) so it doesn't matter very much which maintenance strategy it uses, but after implementing it I'm pretty confident that it works and I might switch my other packages to it.

If you're interested in knowing more about clever ways to use modern VCS for packaging, you may want to join the vcs-pkg mailing-list.

18 March 2008

GNOME 2.22 in sid: check (mostly)


The status page speaks for itself: most of GNOME 2.22 is now available in sid. A few packages are being staged in experimental to avoid breaking too much stuff, and a few of the remaining missing pieces are waiting in NEW. Special thanks to Sebastian Dröge who's done an incredible work with more than 90 uploads over the past week!

16 March 2008

Updated Debian Vcs-* statistics

Today I discovered that I have a nightly cron job on one of my machines to generate the Vcs statistics I mentioned previously; somehow I had forgotten. So here's an updated graph:

Subversion and Git are still growing fast, the others aren't very successful. 18.8% of all source packages in main are maintained in Subversion, and Git is up to 4.3%.

It's also interesting to note that 91% of the packages maintained in Subversion are hosted on Alioth (svn.debian.org) whereas 77% of the packages maintained in Git are on git.debian.org.

01 January 2008

Debian 2007 timeline

[Reposted from debian-publicity.]

I started a page on the wiki to assemble a timeline of the interesting events of 2007: Timeline2007.

Please help me complete it! I've collected events from the News section of the website, times.debian.net and the -devel-announce and -project lists. I may have missed some important items, so feel free to add your own.

03 November 2007

Locating old Debian source packages

So you want to import your pet package into the new and shiny VCS of the day, but you don't have the complete source history at hand, and some of the revisions are too old to be on snapshot.debian.net or were never part of a Debian release and aren't on archive.debian.org?

Here's an easy way out:

  1. For each missing source package, look into the debian-devel-changes archive for the MD5 sum of the corresponding .diff.gz file, it's present in the mail sent by katie.
  2. Google this MD5 sum. If your source package is sitting somewhere in the ether, the dsc file contains the sum and Google probably indexes it.
  3. Fetch your source package, check the signature on the dsc and the integrity of the .diff.gz. If everything looks good, go to step 1.
For the record, I found some of mine on old-releases.ubuntu.com, apt.freespire.org and these snapshots.

04 October 2007

Balance is but a shimmering notion

After beating git-debimport into submission for some time, I now have a reasonably coherent git archive of tcpdump with proper commit dates, correct attribution and a tag layout compatible with what git-buildpackage does.

Along the way I tried out guilt (git + quilt). Unfortunately, my trust level almost immediately reached zero.

Apart from that, today pretty much sucked. Prepare your sleep apparatus. Better luck tomorrow.

30 September 2007

Clint rocks

19h57:

Subject: Bug#444747: Acknowledgement (git-mergetool completion missing)

Thank you for the problem report you have sent regarding Debian.
This is an automatically generated reply, to let you know your message has
been received. It is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
Clint Adams <schizo@debian.org>
21h24:
Subject: Bug#444747: fixed in zsh 4.3.4-19

This is an automatic notification regarding your Bug report
which was filed against the zsh package:

#444747: git-mergetool completion missing

It has been closed by Clint Adams <schizo@debian.org>.

19 September 2007

In an expression of the inexpressible

One year ago today:

FOR IMMEDIATE RELEASE

19th September, 2006 -- Dunc-Tank.org is pleased to announce its first fund-raising experiment: collecting donations to help Debian GNU/Linux 4.0, codenamed etch, be released on schedule on the 4th of December, 2006.

Dunc-Tank.org aims to support Debian's efforts to meet its release schedule for etch by financially supporting the volunteers working on managing the release process, allowing them to devote their full attention to that task. [...]

30 August 2007

Random fact of the day

Over the last five months I have orphaned or given away a total of 12 packages. It feels good.

27 August 2007

More fun with md5sums: collisions

During my experiments with file duplication in the Debian archive, it occurred to me that having a list of files with identical MD5 hashes was a good starting point for finding MD5 collisions in the archive. If any of those files were different but had the same hash as computed by MD5, I'd have a collision.

Unfortunately, checking if the files differ involves extracting the data tarball of each affected package and computing another hash for the files (I used SHA-1), which takes a while. 3h32m and 47GB of extracted files later, I now have the results and there is no MD5 collision in the Debian archive. The chances were slim if not null, but at least now I'm sure.

(In fact I did find one occurrence, but it turned out that the file's MD5 hash in DEBIAN/md5sums was incorrect, for some reason.)

21 August 2007

A study of file duplication in the Debian archive

[Long post, sorry. If you're short on time skip to the end for the juicy parts.]

It all started about a week ago when I decided to find out how many files were unique in the whole Debian archive, and how many were duplicated in the same package or in other packages. I had done some work on duplication detection before, and I knew that the process involves getting some kind of hash value of every document, then finding duplicate values in the list of hashes.

I already had a local Debian mirror so the raw data was all there. I basically had two options: either compute the hash value of every file myself from the packages in my mirror, or find some other (ideally faster) way to determine if a file is unique. I quickly decided that the best option was to use the md5sums embedded in most Debian packages (in the DEBIAN/md5sums control file). That would give me MD5 hashes of all regular files, excluding conffiles.

So the first step was to check how many packages have embedded md5sums, and a simple script showed that less than 3% don't have them. This first check exposed a bug in python-debian, which was duly reported. Along the way, my post prompted a discussion on debian-devel about the state of md5sums, and I set up a daily check to keep track of things.

The next step was to make sure that all md5sum-enabled packages had usable information. It turned out that python-debian choked on 103 packages because of embedded spaces in filenames, and that it also had a blocker bug when used in Python 2.4. I filed two more bugs.

All that remained was the easiest part: write the program to find duplicate files. I did that, and I now have all kinds of funny statistics:

  • The dataset consists of 20170 packages with md5sums, shipping a total of 2069830 files. That gives an average ratio of 102 files per package, excluding conffiles.
  • There are 113732 duplicates in the archive. 1556 files are duplicated more than 10 times, and 14 files are duplicated more than 300 times.
  • The empty file is present 8325 times in the archive, spread over 874 packages. This isn't surprising since it's used for all kinds of purposes like Python's init.py files, Perl's .bs files, etc. I also learned (among other oddities) that the python2.4-doc package ships a few zero-byte .png files. Uh?
  • Also popular is the file with just one newline character in it: 343 occurrences. In the same vein, we have 461 occurrences of the "deny from all\n" file.
  • Most of the hits are for Doxygen images in -dev and -doc packages, namely doxygen.png, tab_b.gif, tab_l.gif, etc (about 350 hits each). In the same category, gjdoc CSS files (149 hits).
  • The partlibrary package is our worst offender. It ships a total of 9680 non-empty files, and only 4833 of them are unique. 6 files are duplicated more than 400 times each in the package.
If you're still reading so far, thanks! You can see the report file, and the program itself. If you want to run it, note that it's optimized for speed rather than memory efficiency; it runs in under 3 minutes but uses up to 1.5GB of memory (my home desktop has 4GB).

The grand conclusion is that all things considered, there is very little file duplication in the archive: the #1 duplicate represents less than 0.4% of the two million analyzed files, and it doesn't actually use any space since it's an empty file... :)

19 August 2007

bzip2 compression in debs

During my previous adventures with the Debian archive, I found that two packages in the archive use bzip2 compression inside the .deb instead of the traditional gzip compression, so I decided to try it out on emacs-snapshot (one of my larger packages). The combined size of the deb files goes from 36764KB to 33880KB, a 2884KB (7.8%) difference. It also makes both lintian and linda unhappy, the former gives me the following error:

E: emacs-snapshot-nox: deb-data-member-wrongly-compressed
N:
N: The binary package contains a data member not compressed with gzip.
N: From dpkg-dev 1.11 on, you can configure the way the data tarball is
N: compressed. Though this is possible, you are not allowed to use it
N: before dpkg 1.11 (or later) enters stable.
and linda just bombs:
E: emacs-snapshot-common; Package uses a newer feature of dpkg.
This package uses a data.tar, or data.tar.bz2 member of the .deb. This
was introduced in dpkg 1.11, but is not allowed to be used until dpkg
1.11 or later hits stable.
File ...3_all.deb failed to process: Level 2 unpacking failed:
Could not unpack data tarball
Etch was released with dpkg 1.13.25, so bzip2 compression is probably allowed now. But is a 7.8% saving worth the incompatibility price? I'm not sure.

16 August 2007

Debian packages without md5sums

Random testing of my local Debian mirror shows that 644 binary packages out of 20774 (3.1%) are missing the DEBIAN/md5sums control file. This file (used by debsums) is very useful to check for disk corruption, and even though debsums can generate the data on the fly when a package is installed, it's better to have this information computed at build time.

So if your packages use debhelper, make sure that you have the proper dh_md5sums call in debian/rules. If you don't use debhelper, you'll have to generate the DEBIAN/md5sums control file manually (see dh_md5sums's source for inspiration). If you use a high-level build tool like CDBS, you probably don't have to do anything.

Should I file bugs against the 446 affected source packages? A few maintainers apparently exclude some binary packages on purpose; for example the zsh source package generates md5sums for zsh, zsh-static and zsh-doc, but not for zsh-dev and zsh-dbg...

Update: updated statistics with a run on a full amd64+all mirror.

18 June 2007

dcmd

4:30am and I can't sleep... which in a way is good since it gave me some time to think about the script I posted yesterday and how it could be generalized into a simpler and far more powerful dcmd script which just expands any .dsc/.changes argument found in the command line:

elegiac$ dcmd scp -C /tmp/1/dasher_4.4.2-1.dsc romain@yeast:/tmp
dasher_4.4.2.orig.tar.gz 100% 8728KB 4.3MB/s 00:02
dasher_4.4.2-1.diff.gz 100% 5231 5.1KB/s 00:00
dasher_4.4.2-1.dsc 100% 1259 1.2KB/s 00:00
elegiac$
Its only limitation is that the .dsc/.changes must be local in order for Python to parse it.
elegiac$ dcmd sha1sum dasher_4.4.2-2_amd64.changes                                                   
87cd6eb29d5f54c1547b2bf7531a0e303dbc2fa3 dasher_4.4.2-2.dsc
70d0729c8955309efa230832cf5e82bd5e28cf7c dasher_4.4.2-2.diff.gz
a5c15d3e6c3b297897651e9a5aa40451c73b2420 dasher-data_4.4.2-2_all.deb
68383058289fb41cdf19d57225afafafd4032719 dasher_4.4.2-2_amd64.deb
481e2008ad78be0721335634954dbe3a0439914b dasher_4.4.2-2_amd64.changes
elegiac$
Sound useful?

PS: Christoph, I know that dput supports scp and rsync, but it requires a host definition in ~/.dput.cf, and works only on .changes files.

17 June 2007

A .dsc/.changes aware scp

Every time I make a new emacs-snapshot release I have to copy the source package from my amd64 desktop to my i386 laptop to build it, then copy the built packages back in order to upload them. This scp'ing gets tiring after a while, so I wrote a trivial script called dscp which acts as a Debian-aware wrapper around scp, using python-deb822. It's similar in spirit to Myon's dget script, which does pretty much the same thing with wget.

Note that unlike scp it can only copy files to other machines, and currently does no command-line parsing at all, so you cannot pass options to scp (other than the user@host:/bla part, of course).

Also note that if the destination is local it will serve as a .dsc/.changes aware cp, which is also handy in some situations.

If there's interest I could probably add some sanity checks and propose it for inclusion into devscripts.

19 March 2007

My DPL 2007 vote

Same as last year: NOTA.