{"id":182,"date":"2008-02-24T22:42:11","date_gmt":"2008-02-24T20:42:11","guid":{"rendered":"http:\/\/glandium.org\/blog\/?p=182"},"modified":"2010-01-27T08:52:29","modified_gmt":"2010-01-27T07:52:29","slug":"testing-the-crazy-idea","status":"publish","type":"post","link":"https:\/\/glandium.org\/blog\/?p=182","title":{"rendered":"Testing the crazy idea"},"content":{"rendered":"<p>So, <a href=\"http:\/\/madism.org\">MadCoder<\/a> had doubts about the <a href=\"\/blog\/?p=181\">crazy idea<\/a>, so I took a little time to do a test with a package I maintain, namely, xulrunner.<\/p>\n<p>First, take all the deb files in the history of the package (at least, everything that is available on <a href=\"http:\/\/snapshot.debian.net\">snapshot.debian.net<\/a> today).<\/p>\n<blockquote><p><code>wget -O - -q http:\/\/snapshot.debian.net\/archive\/pool\/x\/xulrunner\/binary-amd64\/Packages.gz | gzip -cd &gt; list<br \/>\nawk '\/^Filename:\/{print $2}' list | xargs -I{} wget http:\/\/snapshot.debian.net\/archive\/{}<\/code><\/p><\/blockquote>\n<p>Next, commit all these in a different repo per package name :<\/p>\n<blockquote><p><code>perl -e 'use Dpkg::Version qw(vercmp); sub v { my $f = $_[0]; $f  =~ s\/.*_(.*)_.*\/$1\/; $f } print sort { vercmp(v($a), v($b)); } map { s\/^Filename: .*\\\/\/\/; $_ } grep { \/^Filename:\/ } <>;' list | while read f; do<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;pkg=${f%_*_*}<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;[ ! -d $pkg ] && mkdir $pkg && ( cd $pkg ; git init )<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;cd $pkg<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;ar -x ..\/$f<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;mkdir data control<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;tar -C data -zxf data.tar.gz<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;tar -C control -zxf control.tar.gz<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;git add data control<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;git commit -q -m $f<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;rm -rf data control data.tar.gz control.tar.gz debian-binary<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;cd ..<br \/>\ndone<\/code><\/p><\/blockquote>\n<p>Finally, evaluate sizes for each package, respectively, of all .deb files, their content imported in git (only the .git directory, including the index ; some space could be gained removing it), and the \"optimized\" git repository (after <code>git gc<\/code>, without modifying delta depth or window size, which may even improve the result)<\/p>\n<blockquote><p><code>awk '\/^Package:\/{print $2}' list | sort -u | while read p; do<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;du -c --si ${p}_*.deb | tail -1<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;du -s --si $p<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;cd $p; git gc ; cd ..<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;echo<br \/>\ndone<\/code><\/p>\n<p><code>17M\ttotal<br \/>\n16M\tlibmozjs0d<br \/>\n4.7M\tlibmozjs0d<\/code><\/p>\n<p><code>34M\ttotal<br \/>\n31M\tlibmozjs0d-dbg<br \/>\n14M\tlibmozjs0d-dbg<\/code><\/p>\n<p><code>4.7M\ttotal<br \/>\n7.1M\tlibnspr4-0d<br \/>\n1.3M\tlibnspr4-0d<\/code><\/p>\n<p><code>9.2M\ttotal<br \/>\n12M\tlibnspr4-0d-dbg<br \/>\n2.8M\tlibnspr4-0d-dbg<\/code><\/p>\n<p><code>25M\ttotal<br \/>\n28M\tlibnss3-0d<br \/>\n4.6M\tlibnss3-0d<\/code><\/p>\n<p><code>81M\ttotal<br \/>\n61M\tlibnss3-0d-dbg<br \/>\n21M\tlibnss3-0d-dbg<\/code><\/p>\n<p><code>15M\ttotal<br \/>\n16M\tlibnss3-tools<br \/>\n3.1M\tlibnss3-tools<\/code><\/p>\n<p><code>288M\ttotal<br \/>\n265M\tlibxul0d<br \/>\n146M\tlibxul0d<\/code><\/p>\n<p><code>2.0G\ttotal<br \/>\n1.8G\tlibxul0d-dbg<br \/>\n1.1G    libxul0d-dbg<\/code><\/p>\n<p><code>5.1M\ttotal<br \/>\n6.8M\tpython-xpcom<br \/>\n1.1M\tpython-xpcom<\/code><\/p>\n<p><code>2.5M\ttotal<br \/>\n5.6M\tspidermonkey-bin<br \/>\n861k\tspidermonkey-bin<\/code><\/p>\n<p><code>13M\ttotal<br \/>\n14M\txulrunner<br \/>\n2.0M\txulrunner<\/code><\/p>\n<p><code>3.3M\ttotal<br \/>\n5.9M\txulrunner-gnome-support<br \/>\n979k\txulrunner-gnome-support<br \/>\n<\/code><\/p><\/blockquote>\n<p>So these packages, stored in git, take between 15 and roughly 50 percent of the .deb size, which may be a nice improvement. Il would be interesting to know how these numbers evolve with time. Some files, such as the changelog.Debian.gz files, would also benefit from being stored in plain text instead of gzipped form.<\/p>\n<p>Note <code>git gc<\/code> took a while and a lot of memory for libxul0d-dbg. Also note these don't include delta files that would be necessary to recreate the original .deb file, but this shouldn't make a huge difference.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, MadCoder had doubts about the crazy idea, so I took a little time to do a test with a package I maintain, namely, xulrunner. First, take all the deb files in the history of the package (at least, everything that is available on snapshot.debian.net today). wget -O &#8211; -q http:\/\/snapshot.debian.net\/archive\/pool\/x\/xulrunner\/binary-amd64\/Packages.gz | gzip -cd &gt; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,5],"tags":[23],"class_list":["post-182","post","type-post","status-publish","format-standard","hentry","category-misc","category-pdo","tag-en"],"_links":{"self":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/182","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=182"}],"version-history":[{"count":1,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/182\/revisions"}],"predecessor-version":[{"id":692,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/182\/revisions\/692"}],"wp:attachment":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=182"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=182"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=182"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}