19 August 2007

bzip2 compression in debs

During my previous adventures with the Debian archive, I found that two packages in the archive use bzip2 compression inside the .deb instead of the traditional gzip compression, so I decided to try it out on emacs-snapshot (one of my larger packages). The combined size of the deb files goes from 36764KB to 33880KB, a 2884KB (7.8%) difference. It also makes both lintian and linda unhappy, the former gives me the following error:

E: emacs-snapshot-nox: deb-data-member-wrongly-compressed
N:
N: The binary package contains a data member not compressed with gzip.
N: From dpkg-dev 1.11 on, you can configure the way the data tarball is
N: compressed. Though this is possible, you are not allowed to use it
N: before dpkg 1.11 (or later) enters stable.
and linda just bombs:
E: emacs-snapshot-common; Package uses a newer feature of dpkg.
This package uses a data.tar, or data.tar.bz2 member of the .deb. This
was introduced in dpkg 1.11, but is not allowed to be used until dpkg
1.11 or later hits stable.
File ...3_all.deb failed to process: Level 2 unpacking failed:
Could not unpack data tarball
Etch was released with dpkg 1.13.25, so bzip2 compression is probably allowed now. But is a 7.8% saving worth the incompatibility price? I'm not sure.

5 comments:

Adrian Bunk said...

The incompatibility should not be a problem since skipping a stable release when upgrading is anyway not supported.

A more important question is whether bzip2 would be a good choice: The slower decompression speed would hurt slower computers.

Henry said...

Emacs will probably give you better than average savings too. *.elc-files are pretty compressible. I doubt you'd get 7.8% savings on average.

Romain Francoise said...

Adrian, I meant the incompatibility with tools other than dpkg, like python-debian etc. This is how I found out about these two packages (bug #438486).

Henry: good point, the saving on emacs-snapshot-common alone (which ships the .elc files) is 13.2%.

Anonymous said...

You can even save a lot more using lzma compression. Check http://www.linuks.mine.nu/sizematters/

chithanh said...

Certainly the numbers should be checked carefully before speculating about advantages and drawbacks.

Concerning the slower decompression, it would be interesting to know how much time an apt-get dist-upgrade from Sarge to Etch actually spends inside gzip.

The compression gains from the link in the previous comment are indeed impressive. This might not only result in fewer installation media, but also allow for more packages on the first CD/DVD.