<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-968657991057088749.post2684918311466970340..comments</id><updated>2007-09-08T01:45:35.628Z</updated><title type='text'>Comments on Individuation Redux: A study of file duplication in the Debian archive</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.orebokech.com/feeds/2684918311466970340/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default'/><link rel='alternate' type='text/html' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html'/><author><name>Romain Francoise</name><uri>http://www.blogger.com/profile/10308172246322398662</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>4</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-968657991057088749.post-4838193682613501307</id><published>2007-09-08T01:45:35.628Z</published><updated>2007-09-08T01:45:35.628Z</updated><title type='text'>You might be interested in a similar project calle...</title><content type='html'>You might be interested in a similar project called &lt;A HREF="https://svn.cse.ucdavis.edu/trac/cdr/" REL="nofollow"&gt;CDR (checksums done right)&lt;/A&gt; that I'm involved with. It's basically tripwire using vendor checksums (but since most debs don't have sha256 we generate them ourselves). We use virtualization to verify a system while it is online.&lt;BR/&gt;&lt;BR/&gt;We currently have support for deb and rpm and we're looking at windows support as well. Ubuntu mainly but it could easily be extended to Debian.&lt;BR/&gt;&lt;BR/&gt;A few stats:&lt;BR/&gt;&lt;BR/&gt; * Architectures i386 and amd64 only&lt;BR/&gt; * Centos 4,5 and Ubuntu dapper-gutsy&lt;BR/&gt; * 7,033,495 unique files (via sha256)&lt;BR/&gt; * 134,134 unique packages&lt;BR/&gt;&lt;BR/&gt;We haven't done much with tracking maintainers but that's a good idea and we'll likely look into some sort of trust system based on the maintainer. More for revocation reasons that actual trust. If you'd like to contribute to collaborate email me (scott at cse ucdavis education).</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/4838193682613501307'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/4838193682613501307'/><link rel='alternate' type='text/html' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html?showComment=1189215935628#c4838193682613501307' title=''/><author><name>sc0ttbeardsley</name><uri>http://www.blogger.com/profile/04816316891548764722</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html' ref='tag:blogger.com,1999:blog-968657991057088749.post-2684918311466970340' source='http://www.blogger.com/feeds/968657991057088749/posts/default/2684918311466970340' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-968657991057088749.post-8771297810360867763</id><published>2007-08-22T06:05:36.237Z</published><updated>2007-08-22T06:05:36.237Z</updated><title type='text'>It would be easy to do, but it requires getting th...</title><content type='html'>It would be easy to do, but it requires getting the sizes from the data tarball, which has a significant CPU cost (it must be uncompressed first).</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/8771297810360867763'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/8771297810360867763'/><link rel='alternate' type='text/html' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html?showComment=1187762736237#c8771297810360867763' title=''/><author><name>Romain Francoise</name><uri>http://www.blogger.com/profile/10308172246322398662</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='05132636648155326799'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html' ref='tag:blogger.com,1999:blog-968657991057088749.post-2684918311466970340' source='http://www.blogger.com/feeds/968657991057088749/posts/default/2684918311466970340' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-968657991057088749.post-8272251211850983107</id><published>2007-08-21T23:04:27.995Z</published><updated>2007-08-21T23:04:27.995Z</updated><title type='text'>I like all the statistics, but you left out the on...</title><content type='html'>I like all the statistics, but you left out the one I was most interested in seeing: the amount of wasted space due to duplication, both as an absolute value and as a % of the total archive. Although from your description of the the duplicated files, I'm guessing it will be low.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/8272251211850983107'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/8272251211850983107'/><link rel='alternate' type='text/html' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html?showComment=1187737467995#c8272251211850983107' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html' ref='tag:blogger.com,1999:blog-968657991057088749.post-2684918311466970340' source='http://www.blogger.com/feeds/968657991057088749/posts/default/2684918311466970340' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-968657991057088749.post-2433897642130940434</id><published>2007-08-21T21:28:37.188Z</published><updated>2007-08-21T21:28:37.188Z</updated><title type='text'>Well it certainly uses inodes ;)</title><content type='html'>Well it certainly uses inodes ;)</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/2433897642130940434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/968657991057088749/2684918311466970340/comments/default/2433897642130940434'/><link rel='alternate' type='text/html' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html?showComment=1187731717188#c2433897642130940434' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://blog.orebokech.com/2007/08/study-of-file-duplication-in-debian.html' ref='tag:blogger.com,1999:blog-968657991057088749.post-2684918311466970340' source='http://www.blogger.com/feeds/968657991057088749/posts/default/2684918311466970340' type='text/html'/></entry></feed>