Archive for February, 2008

Testing the crazy idea

So, MadCoder had doubts about the crazy idea, so I took a little time to do a test with a package I maintain, namely, xulrunner.

First, take all the deb files in the history of the package (at least, everything that is available on today).

wget -O - -q | gzip -cd > list awk '/^Filename:/{print $2}' list | xargs -I{} wget{}

Next, commit all these in a different repo per package name :

perl -e 'use Dpkg::Version qw(vercmp); sub v { my $f = $_[0]; $f =~ s/.*_(.*)_.*/$1/; $f } print sort { vercmp(v($a), v($b)); } map { s/^Filename: .*\///; $_ } grep { /^Filename:/ } ;' list | while read f; do     pkg=${f%_*_*}     [ ! -d $pkg ] && mkdir $pkg && ( cd $pkg ; git init )     cd $pkg     ar -x ../$f     mkdir data control     tar -C data -zxf data.tar.gz     tar -C control -zxf control.tar.gz     git add data control     git commit -q -m $f     rm -rf data control data.tar.gz control.tar.gz debian-binary     cd .. done

Finally, evaluate sizes for each package, respectively, of all .deb files, their content imported in git (only the .git directory, including the index ; some space could be gained removing it), and the “optimized” git repository (after git gc, without modifying delta depth or window size, which may even improve the result)

awk '/^Package:/{print $2}' list | sort -u | while read p; do     du -c --si ${p}_*.deb | tail -1     du -s --si $p     cd $p; git gc ; cd ..     echo done 17M total 16M libmozjs0d 4.7M libmozjs0d 34M total 31M libmozjs0d-dbg 14M libmozjs0d-dbg 4.7M total 7.1M libnspr4-0d 1.3M libnspr4-0d 9.2M total 12M libnspr4-0d-dbg 2.8M libnspr4-0d-dbg 25M total 28M libnss3-0d 4.6M libnss3-0d 81M total 61M libnss3-0d-dbg 21M libnss3-0d-dbg 15M total 16M libnss3-tools 3.1M libnss3-tools 288M total 265M libxul0d 146M libxul0d 2.0G total 1.8G libxul0d-dbg 1.1G libxul0d-dbg 5.1M total 6.8M python-xpcom 1.1M python-xpcom 2.5M total 5.6M spidermonkey-bin 861k spidermonkey-bin 13M total 14M xulrunner 2.0M xulrunner 3.3M total 5.9M xulrunner-gnome-support 979k xulrunner-gnome-support

So these packages, stored in git, take between 15 and roughly 50 percent of the .deb size, which may be a nice improvement. Il would be interesting to know how these numbers evolve with time. Some files, such as the changelog.Debian.gz files, would also benefit from being stored in plain text instead of gzipped form.

Note git gc took a while and a lot of memory for libxul0d-dbg. Also note these don’t include delta files that would be necessary to recreate the original .deb file, but this shouldn’t make a huge difference.

2008-02-24 22:42:11+0900

miscellaneous, p.d.o | Comments Off on Testing the crazy idea

Crazy ideas

I often have a bunch of somewhat crazy ideas, and I don’t have any time available to test or implement them, which is sad. So just in case these crazy ideas would scratch someone’s itch, I’m going to throw them in the wild.

I’ve been using git for a few months, now, and used it not only for source code management, but for efficient storage, too. VERY efficient. I’ll have to write about that some day.

Anyways, while installing pristine-tar, today, I just thought it would be neat to have an equivalent pristine-deb, to store deb files efficiently. I’m pretty sure someone else thought about this possibility, but it’s still better that such ideas come to the ears (eyes, actually) of someone that could implement them.

Such a pristine-deb tool could be used to… store packages from That would reduce the amount of space required for the archive dramatically, IMHO. I’m pretty sure old packages are not requested that much, so they could be generated on-the-fly from a CGI script placed as a GET action, so that urls wouldn’t change.

The same could probably be applied to It could even save enough space that could host But that depends on the average package content and its average evolution, which I have absolutely no idea about.

Update: It would also be interesting to have the .diff.gz files in there, too ; it would obviously allow to have an easy view of the contents, such as copyright files, changelogs, and other bits of information available on

Update 2: Actually, pristine-deb would as easy as storing 2 pristine-tars (one for control.tar.gz and one for data.tar.gz), and a debian-binary file. The .deb can be aggregated with

ar -rc file.deb debian-binary control.tar.gz data.tar.gz

2008-02-24 12:52:57+0900

miscellaneous, p.d.o | 3 Comments »

Obligatory FOSDEM post

I'm NOT going to FOSDEM

2008-02-23 12:55:26+0900

miscellaneous, p.d.o | Comments Off on Obligatory FOSDEM post