Testing the crazy idea

So, MadCoder had doubts about the crazy idea, so I took a little time to do a test with a package I maintain, namely, xulrunner.

First, take all the deb files in the history of the package (at least, everything that is available on snapshot.debian.net today).

wget -O - -q http://snapshot.debian.net/archive/pool/x/xulrunner/binary-amd64/Packages.gz | gzip -cd > list
awk '/^Filename:/{print $2}' list | xargs -I{} wget http://snapshot.debian.net/archive/{}

Next, commit all these in a different repo per package name :

perl -e 'use Dpkg::Version qw(vercmp); sub v { my $f = $_[0]; $f =~ s/.*_(.*)_.*/$1/; $f } print sort { vercmp(v($a), v($b)); } map { s/^Filename: .*\///; $_ } grep { /^Filename:/ } <>;' list | while read f; do
    pkg=${f%_*_*}
    [ ! -d $pkg ] && mkdir $pkg && ( cd $pkg ; git init )
    cd $pkg
    ar -x ../$f
    mkdir data control
    tar -C data -zxf data.tar.gz
    tar -C control -zxf control.tar.gz
    git add data control
    git commit -q -m $f
    rm -rf data control data.tar.gz control.tar.gz debian-binary
    cd ..
done

Finally, evaluate sizes for each package, respectively, of all .deb files, their content imported in git (only the .git directory, including the index ; some space could be gained removing it), and the "optimized" git repository (after git gc, without modifying delta depth or window size, which may even improve the result)

awk '/^Package:/{print $2}' list | sort -u | while read p; do
    du -c --si ${p}_*.deb | tail -1
    du -s --si $p
    cd $p; git gc ; cd ..
    echo
done

17M total
16M libmozjs0d
4.7M libmozjs0d

34M total
31M libmozjs0d-dbg
14M libmozjs0d-dbg

4.7M total
7.1M libnspr4-0d
1.3M libnspr4-0d

9.2M total
12M libnspr4-0d-dbg
2.8M libnspr4-0d-dbg

25M total
28M libnss3-0d
4.6M libnss3-0d

81M total
61M libnss3-0d-dbg
21M libnss3-0d-dbg

15M total
16M libnss3-tools
3.1M libnss3-tools

288M total
265M libxul0d
146M libxul0d

2.0G total
1.8G libxul0d-dbg
1.1G libxul0d-dbg

5.1M total
6.8M python-xpcom
1.1M python-xpcom

2.5M total
5.6M spidermonkey-bin
861k spidermonkey-bin

13M total
14M xulrunner
2.0M xulrunner

3.3M total
5.9M xulrunner-gnome-support
979k xulrunner-gnome-support

So these packages, stored in git, take between 15 and roughly 50 percent of the .deb size, which may be a nice improvement. Il would be interesting to know how these numbers evolve with time. Some files, such as the changelog.Debian.gz files, would also benefit from being stored in plain text instead of gzipped form.

Note git gc took a while and a lot of memory for libxul0d-dbg. Also note these don't include delta files that would be necessary to recreate the original .deb file, but this shouldn't make a huge difference.

2008-02-24 22:42:11+0900

miscellaneous, p.d.o

Both comments and pings are currently closed.

Comments are closed.