An update on md5sums, and Debian's growth

Back in August 2007 I looked at the state of embedded md5sums in Debian packages and found that approximately 3% of the files in the archive didn't have checksums. Three years later, things have improved: only 0.76% of the archive is now missing checksums (sid, main/contrib/non-free). (See this lintian report for the list of affected packages.)

Since then there's also been various discussions on this subject and there is now a policy bug open to make md5sums a requirement (at the "should" level). There is also a wishlist bug against debhelper to turn dh_md5sums into dh_checksums with a stronger hash algorithm, but MD5 still being good enough for simple integrity checking, it seems rather pointless to upgrade the algorithm without a trust path in the form of in-package signatures ala RPM...

Anyway, what's perhaps more surprising is the growth of the distribution in only three years: sid has gone from 20774 to 30314 packages, a 45% increase. Similarly, the number of regular files has gone from approximately 2 million to just above 2.9 million.

Indeed, looking at our last five releases, the distribution's growth is impressive:

Whether or not that is a good thing is, of course, yet to be determined. As a data point, I used the UDD to know how many of these thousands of packages are actually used, and to my surprise, 22321 binary packages have a popcon installation count that is less than 500! (By comparison, dpkg's installation count is 89393.) So while each new release adds lots of packages, the majority of them have very few users.

(If you want to check yourself, the query I used is select p.package, version, insts from packages p, popcon where (p.architecture = 'i386' or p.architecture = 'all') and p.release = 'squeeze' and p.package = popcon.package and popcon.insts < 500 order by insts;.)

Posted August 28, 2010 #
Previously: My contribution to dpkg 1.15