Archive for the 'miscellaneous' Category

Problems and expectations

What would you expect a software such as VMware ESX server, in its latest version, to do when it technically can't do what you would like it to do ?

Well, I, for one, would expect it, at least, to tell me... but it doesn't. If you don't have VT enabled on an Intel 64bits processor-based server, and want to run a 64bits OS in a VM configured to host a 64bits guest, it doesn't tell you. All you have for your eyes to stare at is an error message from the guest OS saying that the processor doesn't support 64bits instructions. You have to gather from this message that it only needs VT extensions to be enabled.

Now, if you're not very familiar with these technical details, what would be your first test on such a server ? I'd say, most probably, try to run the 64bits OS on the bare hardware... which would succeed, indeed, leaving the user in a big blur.

Note that in the "processor" part of the configuration panel in the Virtual Infrastructure Client, while there is information about Hyperthreading being enabled or not, there is nothing about of VT.

2007-08-22 20:38:35+0900

miscellaneous, p.d.o | Comments Off on Problems and expectations

One not so great thing about free software developers

... is that more than one are likely to have the same crazy ideas.

When I started to play with xulrunner, which according to my oldest post in the xulrunner category would be near 2 years ago, I had this crazy idea (and here, when I say crazy, I almost think dumb) that it would be neat to have a window manager based on libxul, being able to display both "native" windows and XUL or HTML windows.

Indeed, it would be neat, as in web-2.0-hype or i-can-use-gmail-directly. But that would be so much impossible to secure, and introduce so many different new ways to compromise users...

Guess what... It now exists.

Update: Waw, the GUADEC keynote slides are really full of crap. My favorite ones are about "The Fox" : Good engineering practices and Small, extensible core.

2007-07-19 19:58:30+0900

miscellaneous, xulrunner | 4 Comments »

ffmpeg sucks

Guess what happens when someone complains to ffmpeg developers that their software management and API suck. Well, he gets answered repeatedly to include a CVS snapshot of ffmpeg and link statically against it, "like everyone does". Splendid.

2007-07-18 07:43:03+0900

miscellaneous, p.d.o | 6 Comments »

One great thing about free software developers

... is that more than one are likely to have the same crazy ideas.

A while ago, when I was fooling around with VMware ESX and wrote diskimgfs (which I hope to be able to release some day), I found another implementation of the same root idea (without partition level access, though) in dm-userspace.

A few months ago, Sam posted his code for a SSH/HTTPS port sharing program, idea of which had been on my mind since a few weeks earlier.

And now that I'm playing around with git, I was thinking about having a FUSE filesystem to access a git repository. Guess what, it already exists.

And it actually happens quite often. This is also one of the reasons there are so many implementations of the same things (another being the NIH syndrome).

2007-07-11 23:29:51+0900

miscellaneous, p.d.o | 1 Comment »

Finding where a tarball came from with git

I got the idea reading this question on the webkit-dev list, and from my recollection of the git documentation. Well, for the webkit case, the basic script I put up would be of no help because the tarballs don't contain all the files (plus, they use subversion, not git, so it would also require a long importing process). Anyways...

Considering how SHA-1 hashes of objects are created with git, it is actually pretty easy to generate a SHA-1 hash from a random (non-git) tree, and then, the corresponding commit. First, you have git-hash-object that helps you creating a hash for a particular object (though it's also trivial to do with sha1sum). For regular files, git-hash-object -t blob $filename is enough. For symbolic links, you have to read the link destination, and give it without a trailing character (be it the NULL character or a carriage return) to git-hash-object -t blob --stdin. For directories, you have to generate a tree "structure" by yourself and pass it to git-hash-object -t tree --stdin. I haven't bothered looking at other file types.

The tree structure can be guessed by either looking at mktree.c or at the output of git-cat-file tree $sha1 where $sha1 is the SHA-1 hash for a tree object. It contains 3 informations for each node in the tree : the file mode, with the same format as what stat() returns, except for some reason, permissions are 000 for directories and symbolic links ; the file name ; and the SHA-1 hash. These informations are written with the following format : file mode in octal ascii and no padding zero ("%o") followed by a space character, then the filename followed by a NULL character, and the binary form of the SHA-1 hash.

Nodes are sorted in a not-so-quite lexical order (take a look at base_name_compare in read-cache.c) and are not separated by any special character: the mode of a file just follows the SHA-1 hash of its predecessor.

With all this new knowledge, you should be able to write some code that would return the SHA-1 from an arbitrary directory. Okay, since you must be at least as lazy as I am, you can take the script I wrote.

Now, let's take a look at a real life case : what commit is the latest nightly snapshot for the linux kernel from ? First, download the latest snapshot patch and its baseline, and extract the whole. Then, run my git-hash-tree.pl script with the directory containing the extracted kernel as an argument. It will return, after a while, the SHA-1 hash for the whole tree. During this long process, you also have plenty of time to git clone linus's tree.

Once you're all done, you can search for the commit corresponding to the tree hash (let's call it $hash) with the following command :

git-rev-list --all | while read h; do git-cat-file commit $h | grep -q "^tree $hash" && echo $h && break; done

If you just followed these steps, you should just have spent a great moment having no result at all. There are actually 2 things that prevent this method to properly work with the linux kernel nightlies :

  • The snapshot patches contain a change to the top Makefile that doesn't exist in the repository. You need to remove the -gitn from the EXTRAVERSION variable in the Makefile.
  • git diff only includes diff headers for removal of empty files, so if you apply the snapshot patch with the patch utility (and you can't apply it with git-apply since you don't have a .git directory), empty files that were marked as deleted will still be on your tree. It happens with the current snapshot patch (2.6.22-rc7-git6): it doesn't remove include/asm-blackfin/macros.h.

Note this is a naive method, because I haven't dedicated much time going through git documentation and code to find better ways, if there are any. Also note it's pretty much worthless to do this with the kernel nightly snapshots, since a file containing the SHA-1 hash of the corresponding commit can be found alongside the patch.

I guess a similar method could be used with mercurial, though I could not find a documentation detailing what are the hashes calculated from (I've not searched a lot, I must say, but for git, it was just before my eyes).

2007-07-07 15:32:07+0900

miscellaneous, p.d.o | 6 Comments »

Playing more with LVM, LUKS and the device mapper

Following my previous entry about playing around with LVM, LUKS and the device mapper, I documented myself about internals involved in pvmove, which is more of a challenge, considering there is no such documentation. I could find no useful documentation for either the device mapper or LVM. A bit of good old UTSL later, I could elaborate how to do what I wanted, and realized it was even possible to do in shell script.

So here you are : a shell script to transform an LVM physical volume to a LVM over LUKS physical volume,. I'll detail in between how it works. As the previous script, use at your own risk, it comes with no warranty.
Note you theorically can still access the filesystems underneath without problems. It's an in-place and live conversion. Also note this is more a proof of concept than a proper, risk-less and well-written solution.

set -e
dev=$1
luks=$(mktemp)
cryptdev=$(basename $dev)_crypt
pvsize=$(pvs -o pe_start,pv_size --units s --noheadings --nosuffix "$dev" | awk '{print $1 + $2}')
devsize=$(blockdev --getsz "$dev")
mchunk=8

The script takes the physical volume device to convert as an argument. Note there is no check for the validity of its value.

  • luks will be the temporary file where we are going to create a temporary LUKS volume.
  • pvsize is the full size of the LVM physical volume, i.e. the size of all the physical extents (pv_size) + the LVM headers and metadata (pe_start).
  • devsize is the raw device size.
  • mchunk is the size in blocks of chunks for the mirror target, see below.

dd of="$luks" seek=$devsize count=0 bs=512 2> /dev/null
luksdev=$(losetup -f)
losetup "$luksdev" "$luks"
trap "losetup -d \"$luksdev\"; rm -f \"$luks\"" EXIT

Next, we create a sparse luks file the same size as the device (in case luksFormat would use the size somehow, but I believe it doesn't), and a loopback device on this file. The trap is here to avoid leaving the loopback device and the file when an error occurs later (though during the conversion itself, it will be pointless).

cryptsetup luksFormat -q "$luksdev"
cryptsetup luksOpen "$luksdev" ${cryptdev}_real
read start length crypt format key IVoff cdev offset <<EOF
$(dmsetup table ${cryptdev}_real)
EOF

We create a LUKS device, so that we can get the encryption key ($key), and the size of the LUKS header ($offset). Note you need to add --showkey to the dmsetup table command on sid.

if [ $(expr $devsize - $pvsize) -lt $offset ]; then
  echo Not enough free space after LVM physical volume
  cryptsetup luksClose ${cryptdev}_real
  exit
fi

Check we have enough space after the LVM physical volume to offset everything by the size of the LUKS header. If not, you can still try again after you reduce the size of the LVM physical volume by an extent.

if [ $(expr $devsize % $offset % $mchunk) -gt 0 ]; then
  echo Last
  cryptsetup luksClose ${cryptdev}_real
  exit
fi

This is another check to avoid surprises at the end, when dealing with the last chunk. As the script is written for the moment, it doesn't support cases where this last chunk is not a multiple of $mchunk. So we need to abort in these.

read major minor <<EOF
$(stat -t "$dev" | awk '{print $10,$11}')
EOF
maps=$(dmsetup deps | awk -F: "/\($major, $minor\)/{print \$1}")
dmsetup create $cryptdev <<EOF
0 $length linear $dev 0
EOF
dmsetup reload ${cryptdev}_real <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume ${cryptdev}_real
for map in ${maps}; do
  dmsetup table "$map" | sed s,$major:$minor,/dev/mapper/$cryptdev, | dmsetup reload "$map"
  dmsetup resume "$map"
done

Here, we create the dev_crypt device mapper that is our fake LUKS device, which starts as a simple linear mapper and will end as a complete crypt mapper. This fake LUKS device is inserted as an intermediate mapper between the LVM device mapper and the real device. So, the LVM device mapper will have this fake LUKS device as backend, and, at the beginning, the fake LUKS device maps linearly to the real device.

Note we look for all device mappers using the real device as backend before creating the fake LUKS device to avoid finding the fake LUKS device in the list.

Also note dmsetup reload only loads a new table in the INACTIVE slot, and dmsetup resume makes this inactive table LIVE.

cursor=$length
chunk=$offset
while [ $cursor -gt 0 ]; do
  cursor=$(expr $cursor - $chunk)
  if [ $cursor -lt 0 ]; then
    chunk=$(expr $chunk + $cursor)
    cursor=0
  fi
  (
  [ $cursor -ne 0 ] && echo 0 $cursor linear $dev 0
  echo $cursor $chunk mirror core 1 $mchunk 2 $dev $cursor /dev/mapper/${cryptdev}_real $cursor
  [ $cursor -lt $(expr $length - $chunk) ] && echo $(expr $cursor + $chunk) $(expr $length - $cursor - $chunk) crypt $format $key $(expr $IVoff + $cursor + $chunk) $dev $(expr $offset + $cursor + $chunk)
  ) | dmsetup reload "$cryptdev"
  dmsetup resume "$cryptdev"
  chunks=$(expr $chunk / $mchunk)
  while ! dmsetup status "$cryptdev" | grep "$chunks/$chunks"; do
    true
  done
done

This is where the main work is done : moving the data around. We actually just let the device mapper deal with the data duplication, $offset blocks by $offset blocks ($offset being the LUKS header size), using a mirror target for the chunk being moved. So our disk looks like the following:

We use the extra dev_crypt_real device (the previously remapped LUKS device) as the encryption backend for the mirror.
I haven't figured a better way to wait for the end of the mirroring than to do a loop checking with dmsetup status, dmsetup wait doesn't seem to be very helpful here.
Anyways, this is the part of the script where you don't want a crash to occur. Because if it does, all you can do is start on a rescue system, and try to find where the encrypted part of the disk start to setup a device mapper by hand.
And you'd better have the luks temporary file in a directory that is neither in RAM (think tmpfs) nor in the LVM you are converting (/tmp in a default etch install is, for instance ; note the script works nevertheless fine in this case). Also note the trap will remove the luks temporary file if the script exits...

dmsetup reload "$cryptdev" <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume "$cryptdev"
dmsetup remove "${cryptdev}_real"
dd if="$luks" of=$dev count=$offset bs=512 2> /dev/null

Final steps of the conversion : our dev_crypt device becomes a full LUKS volume, so we can remove dev_crypt_real and add the LUKS headers at the top of the device we converted.

At this moment, the LUKS volume is setup just as if it had been setup by cryptsetup. For LVM to recognize the change properly, you need to run pvscan. Once you ran it, you can do whatever you want with LVM.

Now, you may want to add the following to your /etc/crypttab file:

$cryptdev $dev none luks

i.e. hda5_crypt /dev/hda5 none luks if the device was /dev/hda5.

And if the LVM volume you converted contains your root filesystem, you should run (for Debian systems):

update-initramfs -u

I tested this successfully under qemu and will give it a shot on my laptop some time soon.

Now, because I had a hard time not finding much about the mirror target of the device mapper, here is what I could gather about it. The target syntax is as follows:

<logical_start_sector> <num_sectors> mirror [ core | disk ] <num_params> <param> ... <num_mirrors> [ <destination > <start_sector> ] ...

logical_start_sector, num_sectors, destination and start_sector have the same meaning as in other targets.
core and disk are two different log types (to track differences between mirrors), respectively in memory and on disk.
num_params and params depend on the log type:

  • for core, num_params can be either 1 or 2.
    The first parameter is region_size, which is the size (in blocks) of the chunks the mirror log tracks for synchronization. It can't be less than the page size divided by 512, i.e. 8 on x86, must be a power of 2, and must not exceed num_sectors. Apparently, it is not a problem if num_sectors is not a multiple of region_size.
    The optional second parameter is either sync or nosync, meaning of which I'm not sure. I think it determines whether the mirror should do the initial synchronization (sync) or not (nosync).
  • for disk, num_params can be either 2 or 3, with params being log_device (device where the logs are kept), region_size, as above, and an optional sync or nosync, as above too.

num_mirrors is the number of mirrors and for each mirror, we have a pair destination and start_sector.

[ Update: after a quick look at the device mapper source code in Linus's git tree, updated the mirror target description ]

2007-06-24 12:27:21+0900

miscellaneous, p.d.o | Comments Off on Playing more with LVM, LUKS and the device mapper

Playing around with LVM, LUKS and the device mapper

I finally decided myself to switch my laptop to using a LVM over LUKS setup instead of LVM alone. The problem it involves is that there is currently no way that I know to do this in-place. I do have an external disk on which I could pvmove everything and back, but that would not be very challenging ;). So I started wondering how doable that would be.

LUKS has a 1032 sectors (a sector being 512 bytes) overhead in which it store the LUKS header. Following the header, encrypted data just takes as much space as its decrypted counterpart, linearly. So all it would take to do in-place LUKS conversion would be to move data by 1032 sectors, and encrypt it. Which means the main problem is to have 1032 sectors free after the current volume.

Fortunately, LVM doesn't take up all the space on the partition a physical volume is setup, except if you're unlucky or created partitions specifically so that it uses all the space. Because LVM physical volumes use the space by big chunks, 4MB by default), if the size of the partition is not 4MB aligned (modulo the LVM header), you have some free space between the physical volume end and the partition end. We thus need this free space to be greater than 1032 sectors. On my laptop, there is more than 1.8MB free there, so there would be no problem.
Note the debian-installer seems to do more than a simple cryptsetup luksFormat, as LUKS volumes it creates have an overhead of 2056 sectors instead of the 1032 sectors I got creating a volume by hand.

Next problem is how to move the data and encrypt it in-place. Bonus points if that can be done without unmounting the filesystems.

My first guess was to play around with the device mapper and a specially crafted FUSE filesystem to create a fake LUKS volume that could be half encrypted and half not, and would evolve, with time, into a fully encrypted LUKS volume. Since I didn't know much about the device mapper, I looked at it a bit, and discovered a FUSE intermediate would be unnecessary: not only can you remap on-the-fly (which I already guessed was somehow possible, considering how LVM can do it, though I didn't know much about the interaction between LVM and the d-m), but you can also mix dm-crypt and linear chunks in the same d-m table.

It also appears LVM over LUKS has an overhead that is actually not necessary. How sad both use the device mapper but yet can't do an efficient setup with it.

When you setup LVM over LUKS, you open a LUKS volume, which will create a dm-crypt device mapper, and use this device mapper as the physical volume for LVM. Logical volumes created with LVM are actually device mappers using the physical volume as destination device. When you access a block on an LVM logical volume, your access is first mapped through the LV device mapper and then mapped again through the dm-crypt mapper. So it goes through 2 device mappers, but it could do with only one: the linear mapping used in most of the cases with LVM could be replaced by crypt mappings, that are just as linear, but also handle the encryption/decryption part.

Playing around under qemu with a freshly installed etch system, I was able to implement this and came up with the following script. It should work with most setups, but use at your own risk, it comes with no warranty.

MAJOR=$(awk '/device-mapper/{print $1;exit}' /proc/devices)
dmsetup ls --target linear | while read map extra; do
  TABLE=$(dmsetup table $map | while IFS=" :" read start length linear major minor offset; do
    map2=
    if [ "$major" -eq "$MAJOR" ]; then
      IFS=":" read map2 extra <<EOF
$(dmsetup info -c --noheadings -j $major -m $minor)
EOF
    fi
    if [ -z "$map2" ] || [ $(dmsetup table "$map2" | wc -l) -gt 1 ] || [ $(dmsetup table --target crypt "$map2" | wc -l) -eq 0 ]; then
      echo $start $length $linear $major:$minor $offset
    else
      IFS=" " read cstart clength crypt format key IVoff dev coffset <<EOF
$(dmsetup table "$map2")
EOF
      echo $start $length crypt $format $key $(expr $IVoff + $offset) $dev $(expr $coffset + $offset)
    fi
  done)
  if (dmsetup table $map; echo "$TABLE") | sort | uniq -c | grep -v "^ *2 " > /dev/null 2>&1;
  then
    echo Diverting $map
    dmsetup reload $map <<EOF
$TABLE
EOF
    dmsetup resume $map
  fi
done

[ Update: There was a dmsetup resume missing in the script ]

After running this script, you should be able to run cryptsetup luksClose on your LUKS volume. Once you've done it, pvdisplay and other LVM tools will show you nothing, because the physical volume will just have disappeared... but everything will still be in place. Note that if you leave the LUKs volume open, you'd better be careful NOT to do operations with LVM (such as pvmove) because it could break everything.

Note that on sid, you would need to add a --showkey option into the $(dmsetup table "$map2") line. This option is necessary for dmsetup to display the encryption keys.

I wonder if LVM upstream would consider supporting LUKS volumes directly...

Anyways, back to the in-place conversion to LVM over LUKs. The idea would be to:

  • Create a file the size of the LUKS headers + one block.
  • Create a loopback device for this file.
  • luksFormat the loopback device.
  • luksOpen the loopback device, so that dmsetup can give us the key to use. I wonder if there are ways to get the key not involving the creation of the device mapper.
  • luksClose it, because we actually don't need it
  • Create a device mapper that linearly maps the whole LVM physical volume (i.e the real partition, but without the empty tail).
  • Remap the LVM device mappers so that they use our own device mapper instead of the partition as physical volume.
  • Here is the trickiest part, and I still have to take a look at how pvmove does to know exactly how this can be done (I guess it will require an additional device mapper): move data chunk by chunk (offsetting with the size of the LUKS headers), starting from the end of the LVM physical volume, and encrypt at the same time. Once a chunk is moved, adjust our device mapper so that it maps to the linear unmoved data followed by the moved encrypted data. As I said earlier, this is a setup that works.
  • Once everything is moved, we are in a situation where our device mapper is a single crypt mapping, in the same shape as a luksOpen would have made it. We can now copy the LUKS headers at the beginning of the partition, and add the appropriate configuration to /etc/crypttab.

If I got it all right, both LUKS and LVM should be believing they have done this setup themselves, so unmounting, lvchange, etc. should just be fine. It might be necessary to regenerate the initramfs, though.

I'll tell you when I fuck up my hard disk try it.

To be continued...

2007-06-20 19:37:16+0900

miscellaneous, p.d.o | 1 Comment »

Sudoku perverts

You might remember the Sudoku solver in XSLT. Now let me introduce you another range of perverts with solvers in SQL and regular expressions.

2007-06-10 08:17:09+0900

miscellaneous, p.d.o | 2 Comments »

Mozilla feature request response time

Eddy has been exposing some of the long standing feature requests for Mozilla. They have a huge record or such requests. My favourites:

And these requests are only a small subset...

On the other hand, I'm amazed the 1.8.1 branch (the one for Firefox 2.0) is ABI compatible with 1.8.0 (Firefox 1.5). I'm currently running epiphany with a pre-release of libxul 1.8.1.3 (coming soon in sid), and it works flawlessly (so far), without rebuilding epiphany.
I must say I'm impressed, especially when I look back at how 1.7.x releases were.

2007-05-23 19:28:34+0900

miscellaneous, p.d.o | 5 Comments »

Numbers

13256278887989457651018865901401704613 is a prime.
13256278887989457651018865901401704671 is the following prime.
Between them exists a (allegedly) forbidden number. If you don't know how to find it it's 27 more than the first and 31 less than the second. Oops.
Edit: Or you could multiply the following primes: 2, 5, 19, 12043, 216493, and 836256503069278983442067 and then multiply by 32.

2007-05-02 20:16:38+0900

miscellaneous, p.d.o | 2 Comments »