Testing the crazy idea
So, MadCoder had doubts about the crazy idea, so I took a little time to do a test with a package I maintain, namely, xulrunner.
First, take all the deb files in the history of the package (at least, everything that is available on snapshot.debian.net today).
wget -O - -q http://snapshot.debian.net/archive/pool/x/xulrunner/binary-amd64/Packages.gz | gzip -cd > list
awk '/^Filename:/{print $2}' list | xargs -I{} wget http://snapshot.debian.net/archive/{}
Next, commit all these in a different repo per package name :
perl -e 'use Dpkg::Version qw(vercmp); sub v { my $f = $_[0]; $f =~ s/.*_(.*)_.*/$1/; $f } print sort { vercmp(v($a), v($b)); } map { s/^Filename: .*\///; $_ } grep { /^Filename:/ } <>;' list | while read f; do
pkg=${f%_*_*}
[ ! -d $pkg ] && mkdir $pkg && ( cd $pkg ; git init )
cd $pkg
ar -x ../$f
mkdir data control
tar -C data -zxf data.tar.gz
tar -C control -zxf control.tar.gz
git add data control
git commit -q -m $f
rm -rf data control data.tar.gz control.tar.gz debian-binary
cd ..
done
Finally, evaluate sizes for each package, respectively, of all .deb files, their content imported in git (only the .git directory, including the index ; some space could be gained removing it), and the "optimized" git repository (after git gc
, without modifying delta depth or window size, which may even improve the result)
awk '/^Package:/{print $2}' list | sort -u | while read p; do
du -c --si ${p}_*.deb | tail -1
du -s --si $p
cd $p; git gc ; cd ..
echo
done
17M total
16M libmozjs0d
4.7M libmozjs0d
34M total
31M libmozjs0d-dbg
14M libmozjs0d-dbg
4.7M total
7.1M libnspr4-0d
1.3M libnspr4-0d
9.2M total
12M libnspr4-0d-dbg
2.8M libnspr4-0d-dbg
25M total
28M libnss3-0d
4.6M libnss3-0d
81M total
61M libnss3-0d-dbg
21M libnss3-0d-dbg
15M total
16M libnss3-tools
3.1M libnss3-tools
288M total
265M libxul0d
146M libxul0d
2.0G total
1.8G libxul0d-dbg
1.1G libxul0d-dbg
5.1M total
6.8M python-xpcom
1.1M python-xpcom
2.5M total
5.6M spidermonkey-bin
861k spidermonkey-bin
13M total
14M xulrunner
2.0M xulrunner
3.3M total
5.9M xulrunner-gnome-support
979k xulrunner-gnome-support
So these packages, stored in git, take between 15 and roughly 50 percent of the .deb size, which may be a nice improvement. Il would be interesting to know how these numbers evolve with time. Some files, such as the changelog.Debian.gz files, would also benefit from being stored in plain text instead of gzipped form.
Note git gc
took a while and a lot of memory for libxul0d-dbg. Also note these don't include delta files that would be necessary to recreate the original .deb file, but this shouldn't make a huge difference.
2008-02-24 22:42:11+0900
miscellaneous, p.d.o | Comments Off on Testing the crazy idea