Archive for June, 2007

Playing more with LVM, LUKS and the device mapper

Following my previous entry about playing around with LVM, LUKS and the device mapper, I documented myself about internals involved in pvmove, which is more of a challenge, considering there is no such documentation. I could find no useful documentation for either the device mapper or LVM. A bit of good old UTSL later, I could elaborate how to do what I wanted, and realized it was even possible to do in shell script.

So here you are : a shell script to transform an LVM physical volume to a LVM over LUKS physical volume,. I'll detail in between how it works. As the previous script, use at your own risk, it comes with no warranty.
Note you theorically can still access the filesystems underneath without problems. It's an in-place and live conversion. Also note this is more a proof of concept than a proper, risk-less and well-written solution.

set -e dev=$1 luks=$(mktemp) cryptdev=$(basename $dev)_crypt pvsize=$(pvs -o pe_start,pv_size --units s --noheadings --nosuffix "$dev" | awk '{print $1 + $2}') devsize=$(blockdev --getsz "$dev") mchunk=8

The script takes the physical volume device to convert as an argument. Note there is no check for the validity of its value.

luks will be the temporary file where we are going to create a temporary LUKS volume.
pvsize is the full size of the LVM physical volume, i.e. the size of all the physical extents (pv_size) + the LVM headers and metadata (pe_start).
devsize is the raw device size.
mchunk is the size in blocks of chunks for the mirror target, see below.

dd of="$luks" seek=$devsize count=0 bs=512 2> /dev/null luksdev=$(losetup -f) losetup "$luksdev" "$luks" trap "losetup -d \"$luksdev\"; rm -f \"$luks\"" EXIT

Next, we create a sparse luks file the same size as the device (in case luksFormat would use the size somehow, but I believe it doesn't), and a loopback device on this file. The trap is here to avoid leaving the loopback device and the file when an error occurs later (though during the conversion itself, it will be pointless).

cryptsetup luksFormat -q "$luksdev" cryptsetup luksOpen "$luksdev" ${cryptdev}_real read start length crypt format key IVoff cdev offset <<EOF $(dmsetup table ${cryptdev}_real) EOF

We create a LUKS device, so that we can get the encryption key ($key), and the size of the LUKS header ($offset). Note you need to add --showkey to the dmsetup table command on sid.

if [ $(expr $devsize - $pvsize) -lt $offset ]; then echo Not enough free space after LVM physical volume cryptsetup luksClose ${cryptdev}_real exit fi

Check we have enough space after the LVM physical volume to offset everything by the size of the LUKS header. If not, you can still try again after you reduce the size of the LVM physical volume by an extent.

if [ $(expr $devsize % $offset % $mchunk) -gt 0 ]; then echo Last cryptsetup luksClose ${cryptdev}_real exit fi

This is another check to avoid surprises at the end, when dealing with the last chunk. As the script is written for the moment, it doesn't support cases where this last chunk is not a multiple of $mchunk. So we need to abort in these.

read major minor <<EOF $(stat -t "$dev" | awk '{print $10,$11}') EOF maps=$(dmsetup deps | awk -F: "/$$major, $minor$/{print \$1}") dmsetup create $cryptdev <<EOF 0 $length linear $dev 0 EOF dmsetup reload ${cryptdev}_real <<EOF $start $length crypt $format $key $IVoff $dev $offset EOF dmsetup resume ${cryptdev}_real for map in ${maps}; do dmsetup table "$map" | sed s,$major:$minor,/dev/mapper/$cryptdev, | dmsetup reload "$map" dmsetup resume "$map" done

Here, we create the dev_crypt device mapper that is our fake LUKS device, which starts as a simple linear mapper and will end as a complete crypt mapper. This fake LUKS device is inserted as an intermediate mapper between the LVM device mapper and the real device. So, the LVM device mapper will have this fake LUKS device as backend, and, at the beginning, the fake LUKS device maps linearly to the real device.

Note we look for all device mappers using the real device as backend before creating the fake LUKS device to avoid finding the fake LUKS device in the list.

Also note dmsetup reload only loads a new table in the INACTIVE slot, and dmsetup resume makes this inactive table LIVE.

cursor=$length chunk=$offset while [ $cursor -gt 0 ]; do cursor=$(expr $cursor - $chunk) if [ $cursor -lt 0 ]; then chunk=$(expr $chunk + $cursor) cursor=0 fi ( [ $cursor -ne 0 ] && echo 0 $cursor linear $dev 0 echo $cursor $chunk mirror core 1 $mchunk 2 $dev $cursor /dev/mapper/${cryptdev}_real $cursor [ $cursor -lt $(expr $length - $chunk) ] && echo $(expr $cursor + $chunk) $(expr $length - $cursor - $chunk) crypt $format $key $(expr $IVoff + $cursor + $chunk) $dev $(expr $offset + $cursor + $chunk) ) | dmsetup reload "$cryptdev" dmsetup resume "$cryptdev" chunks=$(expr $chunk / $mchunk) while ! dmsetup status "$cryptdev" | grep "$chunks/$chunks"; do true done done

This is where the main work is done : moving the data around. We actually just let the device mapper deal with the data duplication, $offset blocks by $offset blocks ($offset being the LUKS header size), using a mirror target for the chunk being moved. So our disk looks like the following:

We use the extra dev_crypt_real device (the previously remapped LUKS device) as the encryption backend for the mirror.
I haven't figured a better way to wait for the end of the mirroring than to do a loop checking with dmsetup status, dmsetup wait doesn't seem to be very helpful here.
Anyways, this is the part of the script where you don't want a crash to occur. Because if it does, all you can do is start on a rescue system, and try to find where the encrypted part of the disk start to setup a device mapper by hand.
And you'd better have the luks temporary file in a directory that is neither in RAM (think tmpfs) nor in the LVM you are converting (/tmp in a default etch install is, for instance ; note the script works nevertheless fine in this case). Also note the trap will remove the luks temporary file if the script exits...

dmsetup reload "$cryptdev" <<EOF $start $length crypt $format $key $IVoff $dev $offset EOF dmsetup resume "$cryptdev" dmsetup remove "${cryptdev}_real" dd if="$luks" of=$dev count=$offset bs=512 2> /dev/null

Final steps of the conversion : our dev_crypt device becomes a full LUKS volume, so we can remove dev_crypt_real and add the LUKS headers at the top of the device we converted.

At this moment, the LUKS volume is setup just as if it had been setup by cryptsetup. For LVM to recognize the change properly, you need to run pvscan. Once you ran it, you can do whatever you want with LVM.

Now, you may want to add the following to your /etc/crypttab file:

$cryptdev $dev none luks

i.e. hda5_crypt /dev/hda5 none luks if the device was /dev/hda5.

And if the LVM volume you converted contains your root filesystem, you should run (for Debian systems):

update-initramfs -u

I tested this successfully under qemu and will give it a shot on my laptop some time soon.

Now, because I had a hard time not finding much about the mirror target of the device mapper, here is what I could gather about it. The target syntax is as follows:

<logical_start_sector> <num_sectors> mirror [ core | disk ] <num_params> <param> ... <num_mirrors> [ <destination > <start_sector> ] ...

logical_start_sector, num_sectors, destination and start_sector have the same meaning as in other targets.
core and disk are two different log types (to track differences between mirrors), respectively in memory and on disk.
num_params and params depend on the log type:

for core, num_params can be either 1 or 2.
The first parameter is region_size, which is the size (in blocks) of the chunks the mirror log tracks for synchronization. It can't be less than the page size divided by 512, i.e. 8 on x86, must be a power of 2, and must not exceed num_sectors. Apparently, it is not a problem if num_sectors is not a multiple of region_size.
The optional second parameter is either sync or nosync, meaning of which I'm not sure. I think it determines whether the mirror should do the initial synchronization (sync) or not (nosync).
for disk, num_params can be either 2 or 3, with params being log_device (device where the logs are kept), region_size, as above, and an optional sync or nosync, as above too.

num_mirrors is the number of mirrors and for each mirror, we have a pair destination and start_sector.

[ Update: after a quick look at the device mapper source code in Linus's git tree, updated the mirror target description ]

2007-06-24 12:27:21+0900

miscellaneous, p.d.o | Comments Off on Playing more with LVM, LUKS and the device mapper

Videos of Debconf 7

The good thing about not being able to attend Debconf is that you can still watch the talks online. Well, when you can find the videos.

While it's pretty easy to find the video streams, that, when you're a working european, are pretty much of no use, I couldn't find a single link to the video archives that were supposed to be available the same day.

Why oh why does the debconf7 homepage lack such an important link ? Why does video.debconf.org show a phpmyadmin login form ?

Anyways, for those who, like me, haven't found a link, here's one : Video archives for Debconf 7.

2007-06-24 08:34:56+0900

debian | 3 Comments »

Playing around with LVM, LUKS and the device mapper

I finally decided myself to switch my laptop to using a LVM over LUKS setup instead of LVM alone. The problem it involves is that there is currently no way that I know to do this in-place. I do have an external disk on which I could pvmove everything and back, but that would not be very challenging ;). So I started wondering how doable that would be.

LUKS has a 1032 sectors (a sector being 512 bytes) overhead in which it store the LUKS header. Following the header, encrypted data just takes as much space as its decrypted counterpart, linearly. So all it would take to do in-place LUKS conversion would be to move data by 1032 sectors, and encrypt it. Which means the main problem is to have 1032 sectors free after the current volume.

Fortunately, LVM doesn't take up all the space on the partition a physical volume is setup, except if you're unlucky or created partitions specifically so that it uses all the space. Because LVM physical volumes use the space by big chunks, 4MB by default), if the size of the partition is not 4MB aligned (modulo the LVM header), you have some free space between the physical volume end and the partition end. We thus need this free space to be greater than 1032 sectors. On my laptop, there is more than 1.8MB free there, so there would be no problem.
Note the debian-installer seems to do more than a simple cryptsetup luksFormat, as LUKS volumes it creates have an overhead of 2056 sectors instead of the 1032 sectors I got creating a volume by hand.

Next problem is how to move the data and encrypt it in-place. Bonus points if that can be done without unmounting the filesystems.

My first guess was to play around with the device mapper and a specially crafted FUSE filesystem to create a fake LUKS volume that could be half encrypted and half not, and would evolve, with time, into a fully encrypted LUKS volume. Since I didn't know much about the device mapper, I looked at it a bit, and discovered a FUSE intermediate would be unnecessary: not only can you remap on-the-fly (which I already guessed was somehow possible, considering how LVM can do it, though I didn't know much about the interaction between LVM and the d-m), but you can also mix dm-crypt and linear chunks in the same d-m table.

It also appears LVM over LUKS has an overhead that is actually not necessary. How sad both use the device mapper but yet can't do an efficient setup with it.

When you setup LVM over LUKS, you open a LUKS volume, which will create a dm-crypt device mapper, and use this device mapper as the physical volume for LVM. Logical volumes created with LVM are actually device mappers using the physical volume as destination device. When you access a block on an LVM logical volume, your access is first mapped through the LV device mapper and then mapped again through the dm-crypt mapper. So it goes through 2 device mappers, but it could do with only one: the linear mapping used in most of the cases with LVM could be replaced by crypt mappings, that are just as linear, but also handle the encryption/decryption part.

Playing around under qemu with a freshly installed etch system, I was able to implement this and came up with the following script. It should work with most setups, but use at your own risk, it comes with no warranty.

MAJOR=$(awk '/device-mapper/{print $1;exit}' /proc/devices) dmsetup ls --target linear | while read map extra; do TABLE=$(dmsetup table $map | while IFS=" :" read start length linear major minor offset; do map2= if [ "$major" -eq "$MAJOR" ]; then IFS=":" read map2 extra <<EOF $(dmsetup info -c --noheadings -j $major -m $minor) EOF fi if [ -z "$map2" ] || [ $(dmsetup table "$map2" | wc -l) -gt 1 ] || [ $(dmsetup table --target crypt "$map2" | wc -l) -eq 0 ]; then echo $start $length $linear $major:$minor $offset else IFS=" " read cstart clength crypt format key IVoff dev coffset <<EOF $(dmsetup table "$map2") EOF echo $start $length crypt $format $key $(expr $IVoff + $offset) $dev $(expr $coffset + $offset) fi done) if (dmsetup table $map; echo "$TABLE") | sort | uniq -c | grep -v "^ *2 " > /dev/null 2>&1; then echo Diverting $map dmsetup reload $map <<EOF $TABLE EOF dmsetup resume $map fi done

[ Update: There was a dmsetup resume missing in the script ]

After running this script, you should be able to run cryptsetup luksClose on your LUKS volume. Once you've done it, pvdisplay and other LVM tools will show you nothing, because the physical volume will just have disappeared... but everything will still be in place. Note that if you leave the LUKs volume open, you'd better be careful NOT to do operations with LVM (such as pvmove) because it could break everything.

Note that on sid, you would need to add a --showkey option into the $(dmsetup table "$map2") line. This option is necessary for dmsetup to display the encryption keys.

I wonder if LVM upstream would consider supporting LUKS volumes directly...

Anyways, back to the in-place conversion to LVM over LUKs. The idea would be to:

Create a file the size of the LUKS headers + one block.
Create a loopback device for this file.
luksFormat the loopback device.
luksOpen the loopback device, so that dmsetup can give us the key to use. I wonder if there are ways to get the key not involving the creation of the device mapper.
luksClose it, because we actually don't need it
Create a device mapper that linearly maps the whole LVM physical volume (i.e the real partition, but without the empty tail).
Remap the LVM device mappers so that they use our own device mapper instead of the partition as physical volume.
Here is the trickiest part, and I still have to take a look at how pvmove does to know exactly how this can be done (I guess it will require an additional device mapper): move data chunk by chunk (offsetting with the size of the LUKS headers), starting from the end of the LVM physical volume, and encrypt at the same time. Once a chunk is moved, adjust our device mapper so that it maps to the linear unmoved data followed by the moved encrypted data. As I said earlier, this is a setup that works.
Once everything is moved, we are in a situation where our device mapper is a single crypt mapping, in the same shape as a luksOpen would have made it. We can now copy the LUKS headers at the beginning of the partition, and add the appropriate configuration to /etc/crypttab.

If I got it all right, both LUKS and LVM should be believing they have done this setup themselves, so unmounting, lvchange, etc. should just be fine. It might be necessary to regenerate the initramfs, though.

I'll tell you when I ~~fuck up my hard disk~~ try it.

To be continued...

2007-06-20 19:37:16+0900

miscellaneous, p.d.o | 1 Comment »

Teaser

2007-06-12 19:42:49+0900

webkit | 8 Comments »

Sudoku perverts

You might remember the Sudoku solver in XSLT. Now let me introduce you another range of perverts with solvers in SQL and regular expressions.

2007-06-10 08:17:09+0900

miscellaneous, p.d.o | 2 Comments »