Archive for the 'miscellaneous' Category

Emptying a deleted file

Yesterday, at work, we had the typical case where df would say there is (almost) no space left on some device, while du doesn't see as much data present as you would expect from this situation. This happens when you delete a file that another process has opened (and, obviously, not yet closed).

In typical UNIX filesystems, files are actually only entries in a directory, pointing (linking) to the real information about the content, the inode.

The inode contains the information about how many such links exist on the filesystem, the link count. When you create a hard link (ln without -s), you create another file entry in some directory, linking to the same inode as the "original" file. You also increase the link count for the inode.

Likewise, when removing a file, the entry in the directory is removed (though most of the time, really only skipped, but that's another story), and the link count decreased. When the link count is zero, usually, the inode is marked as deleted.

Except when the usage count is not zero.

When a process opens a file, the kernel keeps a usage count for the corresponding inode in memory. When some process is reading from a file, it doesn't really expect it to disappear suddenly. So, as long as the usage count is not null, even when the link count in the inode is zero, the content is kept on the disk and still takes space on the filesystem.

On the other hand, since there is no entry left in any directory linking to the inode, the size for this content can't be added to du's total.

Back to our problem, the origin was that someone had to free some space on a 1GB filesystem, and thought a good idea would be to delete that 860MB log file that nobody cares about. Except that it didn't really remove it, but he didn't really check.

Later, the "filesystem full" problem came back at someone else, who came to ask me what files from a small list he could remove. But the files were pretty small, and that wouldn't have freed enough space. That gave me the feeling that we probably were in this typical case I introduced this post with, which du -sk confirmed: 970MB used on the filesystem according to df, but only 110MB worth of data...

Just in case you would need to find the pid of the process having the deleted file still opened, or even better, get access to the file itself, you can use the following command:

find -L /proc/*/fd -type f -links 0

(this works on Linux ; remove -L on recent Solaris ; on other OSes, you can find the pid with lsof)

Each path this command returns can be opened and its content accessed with a program, such as cat. That will give access to the deleted content.

I already adressed how to re-link such a file, which somehow works under Linux, but in my case, all that mattered was to really remove the file, this time. But we didn't know if it was safe to stop the process still holding the file, nor how to properly restart it. We were left without a possible resolution, but still needed to come up with something before the filesystem gets really full while waiting to be able to deal with the root of the problem.

The first crazy idea I had was to attach a debugger to the process, and use it to close the file descriptor and open a new file instead (I think you can find some examples with google). But there was no debugger installed.

So, I had this other crazy idea: would truncate() work on these /proc/$pid/fd files?

You know what? It does work. So I bought us some time by running:

perl -e 'truncate("/proc/$pid/fd/$fd", 0);'

(somehow, there is no standard executable to do a truncate(), so I always resort to perl)

Afterwards, I also verified the same works under Linux (where you wouldn't really know what it'd do with these files that are symbolic links to somewhere that doesn't exist).

The even simpler following command works, too.

> /proc/$pid/fd/$fd

It doesn't truncate() but open() with O_WRONLY | O_CREAT | O_TRUNC, and close() right after (to simplify), which has the same effect.

Good to know, isn't it?

2008-11-06 22:23:33+0900

miscellaneous, p.d.o | 5 Comments »

Terry Tate is back, and he’s kicking ass

(Please wait after the first 45 seconds)

2008-10-19 09:40:51+0900

miscellaneous, p.d.o | Comments Off on Terry Tate is back, and he’s kicking ass

Testing the crazy idea

So, MadCoder had doubts about the crazy idea, so I took a little time to do a test with a package I maintain, namely, xulrunner.

First, take all the deb files in the history of the package (at least, everything that is available on snapshot.debian.net today).

wget -O - -q http://snapshot.debian.net/archive/pool/x/xulrunner/binary-amd64/Packages.gz | gzip -cd > list
awk '/^Filename:/{print $2}' list | xargs -I{} wget http://snapshot.debian.net/archive/{}

Next, commit all these in a different repo per package name :

perl -e 'use Dpkg::Version qw(vercmp); sub v { my $f = $_[0]; $f =~ s/.*_(.*)_.*/$1/; $f } print sort { vercmp(v($a), v($b)); } map { s/^Filename: .*\///; $_ } grep { /^Filename:/ } <>;' list | while read f; do
    pkg=${f%_*_*}
    [ ! -d $pkg ] && mkdir $pkg && ( cd $pkg ; git init )
    cd $pkg
    ar -x ../$f
    mkdir data control
    tar -C data -zxf data.tar.gz
    tar -C control -zxf control.tar.gz
    git add data control
    git commit -q -m $f
    rm -rf data control data.tar.gz control.tar.gz debian-binary
    cd ..
done

Finally, evaluate sizes for each package, respectively, of all .deb files, their content imported in git (only the .git directory, including the index ; some space could be gained removing it), and the "optimized" git repository (after git gc, without modifying delta depth or window size, which may even improve the result)

awk '/^Package:/{print $2}' list | sort -u | while read p; do
    du -c --si ${p}_*.deb | tail -1
    du -s --si $p
    cd $p; git gc ; cd ..
    echo
done

17M total
16M libmozjs0d
4.7M libmozjs0d

34M total
31M libmozjs0d-dbg
14M libmozjs0d-dbg

4.7M total
7.1M libnspr4-0d
1.3M libnspr4-0d

9.2M total
12M libnspr4-0d-dbg
2.8M libnspr4-0d-dbg

25M total
28M libnss3-0d
4.6M libnss3-0d

81M total
61M libnss3-0d-dbg
21M libnss3-0d-dbg

15M total
16M libnss3-tools
3.1M libnss3-tools

288M total
265M libxul0d
146M libxul0d

2.0G total
1.8G libxul0d-dbg
1.1G libxul0d-dbg

5.1M total
6.8M python-xpcom
1.1M python-xpcom

2.5M total
5.6M spidermonkey-bin
861k spidermonkey-bin

13M total
14M xulrunner
2.0M xulrunner

3.3M total
5.9M xulrunner-gnome-support
979k xulrunner-gnome-support

So these packages, stored in git, take between 15 and roughly 50 percent of the .deb size, which may be a nice improvement. Il would be interesting to know how these numbers evolve with time. Some files, such as the changelog.Debian.gz files, would also benefit from being stored in plain text instead of gzipped form.

Note git gc took a while and a lot of memory for libxul0d-dbg. Also note these don't include delta files that would be necessary to recreate the original .deb file, but this shouldn't make a huge difference.

2008-02-24 22:42:11+0900

miscellaneous, p.d.o | Comments Off on Testing the crazy idea

Crazy ideas

I often have a bunch of somewhat crazy ideas, and I don't have any time available to test or implement them, which is sad. So just in case these crazy ideas would scratch someone's itch, I'm going to throw them in the wild.

I've been using git for a few months, now, and used it not only for source code management, but for efficient storage, too. *VERY* efficient. I'll have to write about that some day.

Anyways, while installing pristine-tar, today, I just thought it would be neat to have an equivalent pristine-deb, to store deb files efficiently. I'm pretty sure someone else thought about this possibility, but it's still better that such ideas come to the ears (eyes, actually) of someone that could implement them.

Such a pristine-deb tool could be used to... store packages from snapshot.debian.net. That would reduce the amount of space required for the archive dramatically, IMHO. I'm pretty sure old packages are not requested that much, so they could be generated on-the-fly from a CGI script placed as a GET action, so that urls wouldn't change.

The same could probably be applied to archive.debian.org. It could even save enough space that archive.debian.org could host snapshot.debian.net. But that depends on the average package content and its average evolution, which I have absolutely no idea about.

Update: It would also be interesting to have the .diff.gz files in there, too ; it would obviously allow to have an easy view of the contents, such as copyright files, changelogs, and other bits of information available on packages.debian.org.

Update 2: Actually, pristine-deb would as easy as storing 2 pristine-tars (one for control.tar.gz and one for data.tar.gz), and a debian-binary file. The .deb can be aggregated with

ar -rc file.deb debian-binary control.tar.gz data.tar.gz

2008-02-24 12:52:57+0900

miscellaneous, p.d.o | 3 Comments »

Obligatory FOSDEM post

I'm NOT going to FOSDEM

2008-02-23 12:55:26+0900

miscellaneous, p.d.o | Comments Off on Obligatory FOSDEM post

Useless use of goto

Someone showed me some beautiful code, that happens to be procmail's. I'm really impressed :

goto jiasc;
do
{ *++to= *++from;
jiasc:;
}
while(--count);

I suggest anyone to take a look at procmail's source code, it's really a thrill.

2008-01-23 08:59:19+0900

miscellaneous, p.d.o | 9 Comments »

Laptop “upgrade” prices

I'm currently evaluating different laptops to decide what I'm going to buy to renew my current one. And while doing so, I came to the conclusion that you are often better off buying the cheapest model of a line, and upgrading it yourself, at least if you're considering Mac or a Vaio.

If you customize a Macbook on the french Apple web store, it costs you 140€ to replace the default 2 x 512MiB SO-DIMM DDR2 PC5300 with 2 x 1GiB, or 810€ for 2 x 2GiB. Such upgrade on other models have apparently the same price ranges.

On the other hand, the most expensive 1GiB module I can find on materiel.net (a french web store selling computer parts, not necessarily the cheapest you can find) is priced 29.89€ and branded Kingston. That's a wooping 60€ to upgrade to 2 x 1GiB (instead of 140), and you keep the original 2 x 512MiB ! And for a 2GiB module, it's "only" 104.95€, which makes the whole 4GiB at roughly 210€ instead of 600 more. And again, you keep the original 2 x 512 MiB.

I don't really know what Kingston memory modules are worth, but are the Apple ones made of gold ? As far as I know, they're not even ECC, which could legitimate the difference.

Now, on the same Macbook, it would cost you 140 € to replace the standard 80GB SATA 5400rpm hard drive with a 160GB one and 280€ for a 250GB disk. On the same materiel.net site, I see a Western Digital 160GB 5200rpm SATA drive at 102.99€, and a 250GB one for 152.49€. So again, it's cheaper to buy the standard model and upgrade it yourself. And you keep the original disk as a bonus !

It works equally well on the american or the japanese Apple stores. And the same applies with Sony (though I couldn't find how to customize a vaio laptop on the french site, I checked it was true on the japanese and american web stores).

On the other hand, Dell and Lenovo seem to have much more reasonable upgrade prices.

What the fsck ?

Update: The more I look at it, the less the memory thing makes sense to me. If it happens to actualy be ECC, why don't they advertise so, which is more important (and justifying the amazing pricing) than the memory being SO-DIMM DDR2 PC5300 (which they do advertise) ?

Update 2: Looking for dmidecode output with google makes it clear that Apple and Sony laptops, at least the ones I checked price for, don't use ECC memory.

2007-11-16 21:07:56+0900

miscellaneous, p.d.o | 3 Comments »

Gobuntu and Firefox

You may remember, a while ago, Mark Shuttleworth announced that there would be a 100% free version of Ubuntu Gutsy Gibbon :

Ubuntu 7.10 will feature a new flavour - as yet unnamed - which takes an ultra-orthodox view of licensing: no firmware, drivers, imagery, sounds, applications, or other content which do not include full source materials and come with full rights of modification, remixing and redistribution.

Later, we learned it would be named Gobuntu.

Well, they didn't quite follow their promise. Yes, Gobuntu includes Firefox, making it a pretty useless failed attempt.

By the way, I'm still amazed so many people believe it was all about the trademarks. For them, I'll quote something I wrote a year ago:

Trademark and copyright are different things. Mozilla® has unnecessarily given a non-free license to “clarify” the trademark situation, but that is not required. To make it clear: Debian thinks the logos are not free because they are not free. Period.

I'm glad at least Mark Pilgrim got it right.

Update: And as seen on Planet Mozilla, Robert Sayre obviously still hasn't understood the issue.

2007-10-19 07:43:57+0900

miscellaneous, p.d.o | 8 Comments »

Adding some VCS information in bash prompt

I don't spend a lot of time customizing my "working" environment nowadays, like enhancing vim configuration, or tweaking the shell. But when I read MadCoder's zsh git-enabled prompt, I though it was too convenient to not have something like that. Except I don't work with git only (sadly, but that's changing), and I don't like colours in prompt (and a 2 lines prompt is too much).

Anyways, since I have a bunch of directories in my $HOME that contain either svk, svn, mercurial, or git working trees, I thought it would be nice to have some information about all this on my prompt.

After a few iterations, here are the sample results:

mh@namakemono:~/dd/packages$
mh@namakemono:(svn)~/dd/packages/iceape[trunk:39972]debian/patches$
mh@namakemono:(svk)~/dd/packages/libxml2[trunk:1308]include/libxml$
mh@namakemono:(hg)~/moz/cvs-trunk-mirror[default]uriloader/exthandler$
mh@namakemono:(git)~/git/webkit[debian]JavaScriptCore/wtf$

The script follows, with a bit of explanation intertwined.

_bold=$(tput bold)
_normal=$(tput sgr0)

tput is a tool I only dicovered recently, and avoids the need to know the escape codes. There are also options for cursor placement, colours, etc. It lies in the ncurses-bin package, if you want to play with it.

__vcs_dir() {
  local vcs base_dir sub_dir ref
  sub_dir() {
    local sub_dir
    sub_dir=$(readlink -f "${PWD}")
    sub_dir=${sub_dir#$1}
    echo ${sub_dir#/}
  }

We declare as much as possible as local (even functions), so that we avoid cluttering the whole environment. sub_dir is going to be used in several places below, which is why we declare it as a function. It outputs the current directory, relative to the directory given as argument.

  git_dir() {
    base_dir=$(git-rev-parse --show-cdup 2>/dev/null) || return 1
    base_dir=$(readlink -f "$base_dir/..")
    sub_dir=$(git-rev-parse --show-prefix)
    sub_dir=${sub_dir%/}
    ref=$(git-symbolic-ref -q HEAD || git-name-rev --name-only HEAD 2>/dev/null)
    ref=${ref#refs/heads/}
    vcs="git"
  }

This is the first function to detect a working tree, for git this time. Each of these functions set the 4 variables we declared earlier: vcs, base_dir, sub_dir and ref. They are, respectively, the VCS type, the top-level directory of the working tree, the current directory, relative to base_dir, and the branch, revision or a reference in the repository, depending on the VCS in use. These functions return 1 if the current directory is not in a working tree of the currently considered VCS.
The base directory of a git working tree can be deduced from the result of git-rev-parse --show-cdup, which gives the way up to the top-level directory, relative to the current directory. readlink -f then gives the canonical top-level directory. The current directory, relative to the top-level, is simply given by git-rev-parse --show-prefix.
git-name-rev --name-only HEAD gives a nice reference for the current HEAD, especially if you're on a detached head. But this can turn out to do a lot of work, introducing a slight lag when you cd for the first time in the git working tree, while most of the time, the HEAD is just a symbolic ref. This is why we first try git-symbolic-ref --name-only HEAD.

  svn_dir() {
    [ -d ".svn" ] || return 1
    base_dir="."
    while [ -d "$base_dir/../.svn" ]; do base_dir="$base_dir/.."; done
    base_dir=$(readlink -f "$base_dir")
    sub_dir=$(sub_dir "${base_dir}")
    ref=$(svn info "$base_dir" | awk '/^URL/ { sub(".*/","",$0); r=$0 } /^Revision/ { sub("[^0-9]*","",$0); print r":"$0 }')
    vcs="svn"
  }

Detecting an svn working tree is easier : it contains a .svn directory, be it top-level or sub directory. We look up the top-level directory by checking the last directory containing a .svn sub directory on the way up. This obviously doesn't work if you checkout under another svn working tree, but I don't do such things.
For the ref, I wanted something like the name of the directory that has been checked out at the top-level directory (usually "trunk" or a branch name), followed by the revision number.

  svk_dir() {
    [ -f ~/.svk/config ] || return 1
    base_dir=$(awk '/: *$/ { sub(/^ */,"",$0); sub(/: *$/,"",$0); if (match("'${PWD}'", $0"(/|$)")) { print $0; d=1; } } /depotpath/ && d == 1 { sub(".*/","",$0); r=$0 } /revision/ && d == 1 { print r ":" $2; exit 1 }' ~/.svk/config) && return 1
    ref=${base_dir##*
}
    base_dir=${base_dir%%
*}
    sub_dir=$(sub_dir "${base_dir}")
    vcs="svk"
  }

svk doesn't have repository files in the working tree, so we would have to ask svk itself if the current directory is a working tree. Unfortunately, svk is quite slow at that (not that it takes several seconds, but that induces a noticeable delay to display the prompt), so we have to parse its config file by ourselves. We avoid running awk twice by outputing both the informations we are looking for, separated by a carriage return, and then do some tricks with bash variable expansion.

  hg_dir() {
    base_dir="."
    while [ ! -d "$base_dir/.hg" ]; do base_dir="$base_dir/.."; [ $(readlink -f "${base_dir}") = "/" ] && return 1; done
    base_dir=$(readlink -f "$base_dir")
    sub_dir=$(sub_dir "${base_dir}")
    ref=$(< "${base_dir}/.hg/branch")
    vcs="hg"
  }

I don't use mercurial much, but I happen to have exactly one working tree (a clone of http://hg.mozilla.org/cvs-trunk-mirror/), so I got some basic information. There is no way we can ask mercurial itself for information, it is too slow for that (main culprit being the python interpreter startup), so we take the informations we can (and since I don't know much about mercurial, that's really basic). Note that if you're deep in the VFS tree, but not in a mercurial working tree, the while loop may be slow. I didn't bother much looking for a better solution.

  git_dir ||
  svn_dir ||
  svk_dir ||
  hg_dir ||
  base_dir="$PWD"

Here we just run all these functions one by one, stopping at the first that matches. Adding some more for other VCS would be easy.

  echo "${vcs:+($vcs)}${_bold}${base_dir/$HOME/~}${_normal}${vcs:+[$ref]${_bold}${sub_dir}${_normal}}"
}
PS1='${debian_chroot:+($debian_chroot)}\u@\h:$(___vcs_dir)\$ '

Finally, we set up the prompt, so that it looks nice with all the gathered information.

Update: made the last lines of the script a little better and factorized.

2007-10-14 13:24:44+0900

miscellaneous, p.d.o | 2 Comments »

Dead battery

A little while ago, it started behaving strangely during charges, and the time the laptop would run on battery dropped significantly. Now, it seems to be just dead:

$ cat /proc/acpi/battery/BAT1/state
present: no

The sad thing is this battery is (only) 3 years old, while the battery in my other laptop, more than 6 years old, is still alive (though it would empty in half an hour).

2007-09-23 10:06:08+0900

miscellaneous, p.d.o | 2 Comments »