Archive for the 'p.d.o' Category

Playing more with LVM, LUKS and the device mapper

Following my previous entry about playing around with LVM, LUKS and the device mapper, I documented myself about internals involved in pvmove, which is more of a challenge, considering there is no such documentation. I could find no useful documentation for either the device mapper or LVM. A bit of good old UTSL later, I could elaborate how to do what I wanted, and realized it was even possible to do in shell script.

So here you are : a shell script to transform an LVM physical volume to a LVM over LUKS physical volume,. I'll detail in between how it works. As the previous script, use at your own risk, it comes with no warranty.
Note you theorically can still access the filesystems underneath without problems. It's an in-place and live conversion. Also note this is more a proof of concept than a proper, risk-less and well-written solution.

set -e
dev=$1
luks=$(mktemp)
cryptdev=$(basename $dev)_crypt
pvsize=$(pvs -o pe_start,pv_size --units s --noheadings --nosuffix "$dev" | awk '{print $1 + $2}')
devsize=$(blockdev --getsz "$dev")
mchunk=8

The script takes the physical volume device to convert as an argument. Note there is no check for the validity of its value.

  • luks will be the temporary file where we are going to create a temporary LUKS volume.
  • pvsize is the full size of the LVM physical volume, i.e. the size of all the physical extents (pv_size) + the LVM headers and metadata (pe_start).
  • devsize is the raw device size.
  • mchunk is the size in blocks of chunks for the mirror target, see below.

dd of="$luks" seek=$devsize count=0 bs=512 2> /dev/null
luksdev=$(losetup -f)
losetup "$luksdev" "$luks"
trap "losetup -d \"$luksdev\"; rm -f \"$luks\"" EXIT

Next, we create a sparse luks file the same size as the device (in case luksFormat would use the size somehow, but I believe it doesn't), and a loopback device on this file. The trap is here to avoid leaving the loopback device and the file when an error occurs later (though during the conversion itself, it will be pointless).

cryptsetup luksFormat -q "$luksdev"
cryptsetup luksOpen "$luksdev" ${cryptdev}_real
read start length crypt format key IVoff cdev offset <<EOF
$(dmsetup table ${cryptdev}_real)
EOF

We create a LUKS device, so that we can get the encryption key ($key), and the size of the LUKS header ($offset). Note you need to add --showkey to the dmsetup table command on sid.

if [ $(expr $devsize - $pvsize) -lt $offset ]; then
  echo Not enough free space after LVM physical volume
  cryptsetup luksClose ${cryptdev}_real
  exit
fi

Check we have enough space after the LVM physical volume to offset everything by the size of the LUKS header. If not, you can still try again after you reduce the size of the LVM physical volume by an extent.

if [ $(expr $devsize % $offset % $mchunk) -gt 0 ]; then
  echo Last
  cryptsetup luksClose ${cryptdev}_real
  exit
fi

This is another check to avoid surprises at the end, when dealing with the last chunk. As the script is written for the moment, it doesn't support cases where this last chunk is not a multiple of $mchunk. So we need to abort in these.

read major minor <<EOF
$(stat -t "$dev" | awk '{print $10,$11}')
EOF
maps=$(dmsetup deps | awk -F: "/\($major, $minor\)/{print \$1}")
dmsetup create $cryptdev <<EOF
0 $length linear $dev 0
EOF
dmsetup reload ${cryptdev}_real <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume ${cryptdev}_real
for map in ${maps}; do
  dmsetup table "$map" | sed s,$major:$minor,/dev/mapper/$cryptdev, | dmsetup reload "$map"
  dmsetup resume "$map"
done

Here, we create the dev_crypt device mapper that is our fake LUKS device, which starts as a simple linear mapper and will end as a complete crypt mapper. This fake LUKS device is inserted as an intermediate mapper between the LVM device mapper and the real device. So, the LVM device mapper will have this fake LUKS device as backend, and, at the beginning, the fake LUKS device maps linearly to the real device.

Note we look for all device mappers using the real device as backend before creating the fake LUKS device to avoid finding the fake LUKS device in the list.

Also note dmsetup reload only loads a new table in the INACTIVE slot, and dmsetup resume makes this inactive table LIVE.

cursor=$length
chunk=$offset
while [ $cursor -gt 0 ]; do
  cursor=$(expr $cursor - $chunk)
  if [ $cursor -lt 0 ]; then
    chunk=$(expr $chunk + $cursor)
    cursor=0
  fi
  (
  [ $cursor -ne 0 ] && echo 0 $cursor linear $dev 0
  echo $cursor $chunk mirror core 1 $mchunk 2 $dev $cursor /dev/mapper/${cryptdev}_real $cursor
  [ $cursor -lt $(expr $length - $chunk) ] && echo $(expr $cursor + $chunk) $(expr $length - $cursor - $chunk) crypt $format $key $(expr $IVoff + $cursor + $chunk) $dev $(expr $offset + $cursor + $chunk)
  ) | dmsetup reload "$cryptdev"
  dmsetup resume "$cryptdev"
  chunks=$(expr $chunk / $mchunk)
  while ! dmsetup status "$cryptdev" | grep "$chunks/$chunks"; do
    true
  done
done

This is where the main work is done : moving the data around. We actually just let the device mapper deal with the data duplication, $offset blocks by $offset blocks ($offset being the LUKS header size), using a mirror target for the chunk being moved. So our disk looks like the following:

We use the extra dev_crypt_real device (the previously remapped LUKS device) as the encryption backend for the mirror.
I haven't figured a better way to wait for the end of the mirroring than to do a loop checking with dmsetup status, dmsetup wait doesn't seem to be very helpful here.
Anyways, this is the part of the script where you don't want a crash to occur. Because if it does, all you can do is start on a rescue system, and try to find where the encrypted part of the disk start to setup a device mapper by hand.
And you'd better have the luks temporary file in a directory that is neither in RAM (think tmpfs) nor in the LVM you are converting (/tmp in a default etch install is, for instance ; note the script works nevertheless fine in this case). Also note the trap will remove the luks temporary file if the script exits...

dmsetup reload "$cryptdev" <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume "$cryptdev"
dmsetup remove "${cryptdev}_real"
dd if="$luks" of=$dev count=$offset bs=512 2> /dev/null

Final steps of the conversion : our dev_crypt device becomes a full LUKS volume, so we can remove dev_crypt_real and add the LUKS headers at the top of the device we converted.

At this moment, the LUKS volume is setup just as if it had been setup by cryptsetup. For LVM to recognize the change properly, you need to run pvscan. Once you ran it, you can do whatever you want with LVM.

Now, you may want to add the following to your /etc/crypttab file:

$cryptdev $dev none luks

i.e. hda5_crypt /dev/hda5 none luks if the device was /dev/hda5.

And if the LVM volume you converted contains your root filesystem, you should run (for Debian systems):

update-initramfs -u

I tested this successfully under qemu and will give it a shot on my laptop some time soon.

Now, because I had a hard time not finding much about the mirror target of the device mapper, here is what I could gather about it. The target syntax is as follows:

<logical_start_sector> <num_sectors> mirror [ core | disk ] <num_params> <param> ... <num_mirrors> [ <destination > <start_sector> ] ...

logical_start_sector, num_sectors, destination and start_sector have the same meaning as in other targets.
core and disk are two different log types (to track differences between mirrors), respectively in memory and on disk.
num_params and params depend on the log type:

  • for core, num_params can be either 1 or 2.
    The first parameter is region_size, which is the size (in blocks) of the chunks the mirror log tracks for synchronization. It can't be less than the page size divided by 512, i.e. 8 on x86, must be a power of 2, and must not exceed num_sectors. Apparently, it is not a problem if num_sectors is not a multiple of region_size.
    The optional second parameter is either sync or nosync, meaning of which I'm not sure. I think it determines whether the mirror should do the initial synchronization (sync) or not (nosync).
  • for disk, num_params can be either 2 or 3, with params being log_device (device where the logs are kept), region_size, as above, and an optional sync or nosync, as above too.

num_mirrors is the number of mirrors and for each mirror, we have a pair destination and start_sector.

[ Update: after a quick look at the device mapper source code in Linus's git tree, updated the mirror target description ]

2007-06-24 12:27:21+0900

miscellaneous, p.d.o | Comments Off on Playing more with LVM, LUKS and the device mapper

Videos of Debconf 7

The good thing about not being able to attend Debconf is that you can still watch the talks online. Well, when you can find the videos.

While it's pretty easy to find the video streams, that, when you're a working european, are pretty much of no use, I couldn't find a single link to the video archives that were supposed to be available the same day.

Why oh why does the debconf7 homepage lack such an important link ? Why does video.debconf.org show a phpmyadmin login form ?

Anyways, for those who, like me, haven't found a link, here's one : Video archives for Debconf 7.

2007-06-24 08:34:56+0900

debian | 3 Comments »

Playing around with LVM, LUKS and the device mapper

I finally decided myself to switch my laptop to using a LVM over LUKS setup instead of LVM alone. The problem it involves is that there is currently no way that I know to do this in-place. I do have an external disk on which I could pvmove everything and back, but that would not be very challenging ;). So I started wondering how doable that would be.

LUKS has a 1032 sectors (a sector being 512 bytes) overhead in which it store the LUKS header. Following the header, encrypted data just takes as much space as its decrypted counterpart, linearly. So all it would take to do in-place LUKS conversion would be to move data by 1032 sectors, and encrypt it. Which means the main problem is to have 1032 sectors free after the current volume.

Fortunately, LVM doesn't take up all the space on the partition a physical volume is setup, except if you're unlucky or created partitions specifically so that it uses all the space. Because LVM physical volumes use the space by big chunks, 4MB by default), if the size of the partition is not 4MB aligned (modulo the LVM header), you have some free space between the physical volume end and the partition end. We thus need this free space to be greater than 1032 sectors. On my laptop, there is more than 1.8MB free there, so there would be no problem.
Note the debian-installer seems to do more than a simple cryptsetup luksFormat, as LUKS volumes it creates have an overhead of 2056 sectors instead of the 1032 sectors I got creating a volume by hand.

Next problem is how to move the data and encrypt it in-place. Bonus points if that can be done without unmounting the filesystems.

My first guess was to play around with the device mapper and a specially crafted FUSE filesystem to create a fake LUKS volume that could be half encrypted and half not, and would evolve, with time, into a fully encrypted LUKS volume. Since I didn't know much about the device mapper, I looked at it a bit, and discovered a FUSE intermediate would be unnecessary: not only can you remap on-the-fly (which I already guessed was somehow possible, considering how LVM can do it, though I didn't know much about the interaction between LVM and the d-m), but you can also mix dm-crypt and linear chunks in the same d-m table.

It also appears LVM over LUKS has an overhead that is actually not necessary. How sad both use the device mapper but yet can't do an efficient setup with it.

When you setup LVM over LUKS, you open a LUKS volume, which will create a dm-crypt device mapper, and use this device mapper as the physical volume for LVM. Logical volumes created with LVM are actually device mappers using the physical volume as destination device. When you access a block on an LVM logical volume, your access is first mapped through the LV device mapper and then mapped again through the dm-crypt mapper. So it goes through 2 device mappers, but it could do with only one: the linear mapping used in most of the cases with LVM could be replaced by crypt mappings, that are just as linear, but also handle the encryption/decryption part.

Playing around under qemu with a freshly installed etch system, I was able to implement this and came up with the following script. It should work with most setups, but use at your own risk, it comes with no warranty.

MAJOR=$(awk '/device-mapper/{print $1;exit}' /proc/devices)
dmsetup ls --target linear | while read map extra; do
  TABLE=$(dmsetup table $map | while IFS=" :" read start length linear major minor offset; do
    map2=
    if [ "$major" -eq "$MAJOR" ]; then
      IFS=":" read map2 extra <<EOF
$(dmsetup info -c --noheadings -j $major -m $minor)
EOF
    fi
    if [ -z "$map2" ] || [ $(dmsetup table "$map2" | wc -l) -gt 1 ] || [ $(dmsetup table --target crypt "$map2" | wc -l) -eq 0 ]; then
      echo $start $length $linear $major:$minor $offset
    else
      IFS=" " read cstart clength crypt format key IVoff dev coffset <<EOF
$(dmsetup table "$map2")
EOF
      echo $start $length crypt $format $key $(expr $IVoff + $offset) $dev $(expr $coffset + $offset)
    fi
  done)
  if (dmsetup table $map; echo "$TABLE") | sort | uniq -c | grep -v "^ *2 " > /dev/null 2>&1;
  then
    echo Diverting $map
    dmsetup reload $map <<EOF
$TABLE
EOF
    dmsetup resume $map
  fi
done

[ Update: There was a dmsetup resume missing in the script ]

After running this script, you should be able to run cryptsetup luksClose on your LUKS volume. Once you've done it, pvdisplay and other LVM tools will show you nothing, because the physical volume will just have disappeared... but everything will still be in place. Note that if you leave the LUKs volume open, you'd better be careful NOT to do operations with LVM (such as pvmove) because it could break everything.

Note that on sid, you would need to add a --showkey option into the $(dmsetup table "$map2") line. This option is necessary for dmsetup to display the encryption keys.

I wonder if LVM upstream would consider supporting LUKS volumes directly...

Anyways, back to the in-place conversion to LVM over LUKs. The idea would be to:

  • Create a file the size of the LUKS headers + one block.
  • Create a loopback device for this file.
  • luksFormat the loopback device.
  • luksOpen the loopback device, so that dmsetup can give us the key to use. I wonder if there are ways to get the key not involving the creation of the device mapper.
  • luksClose it, because we actually don't need it
  • Create a device mapper that linearly maps the whole LVM physical volume (i.e the real partition, but without the empty tail).
  • Remap the LVM device mappers so that they use our own device mapper instead of the partition as physical volume.
  • Here is the trickiest part, and I still have to take a look at how pvmove does to know exactly how this can be done (I guess it will require an additional device mapper): move data chunk by chunk (offsetting with the size of the LUKS headers), starting from the end of the LVM physical volume, and encrypt at the same time. Once a chunk is moved, adjust our device mapper so that it maps to the linear unmoved data followed by the moved encrypted data. As I said earlier, this is a setup that works.
  • Once everything is moved, we are in a situation where our device mapper is a single crypt mapping, in the same shape as a luksOpen would have made it. We can now copy the LUKS headers at the beginning of the partition, and add the appropriate configuration to /etc/crypttab.

If I got it all right, both LUKS and LVM should be believing they have done this setup themselves, so unmounting, lvchange, etc. should just be fine. It might be necessary to regenerate the initramfs, though.

I'll tell you when I fuck up my hard disk try it.

To be continued...

2007-06-20 19:37:16+0900

miscellaneous, p.d.o | 1 Comment »

Teaser

2007-06-12 19:42:49+0900

webkit | 8 Comments »

Sudoku perverts

You might remember the Sudoku solver in XSLT. Now let me introduce you another range of perverts with solvers in SQL and regular expressions.

2007-06-10 08:17:09+0900

miscellaneous, p.d.o | 2 Comments »

Mozilla feature request response time

Eddy has been exposing some of the long standing feature requests for Mozilla. They have a huge record or such requests. My favourites:

And these requests are only a small subset...

On the other hand, I'm amazed the 1.8.1 branch (the one for Firefox 2.0) is ABI compatible with 1.8.0 (Firefox 1.5). I'm currently running epiphany with a pre-release of libxul 1.8.1.3 (coming soon in sid), and it works flawlessly (so far), without rebuilding epiphany.
I must say I'm impressed, especially when I look back at how 1.7.x releases were.

2007-05-23 19:28:34+0900

miscellaneous, p.d.o | 5 Comments »

Mail patterns

A loooong ago, someone on planet debian described mail patterns. What was this one again ?

2007-05-11 08:54:22+0900

debian | 7 Comments »

Numbers

13256278887989457651018865901401704613 is a prime.
13256278887989457651018865901401704671 is the following prime.
Between them exists a (allegedly) forbidden number. If you don't know how to find it it's 27 more than the first and 31 less than the second. Oops.
Edit: Or you could multiply the following primes: 2, 5, 19, 12043, 216493, and 836256503069278983442067 and then multiply by 32.

2007-05-02 20:16:38+0900

miscellaneous, p.d.o | 2 Comments »

Google Keywords – Apr 2007

"adding the iceweasel branding to firefox windows" - coming soon
"asa dotzler sucks" - Blake Ross again ?
"booh debian" - apt-get moo
"ca passe" - ou ça casse
"dumb ass 2" - the return
"fix fucked up partition table" - you should try parted
"good variables to test" - foo ? bar ?
"hate soccer" - me too
"i could not agree less" - can't say disagree ?
"plus c'est gros mieux ça passe" - this also applies to what Sarkozy says
"shaking laptop while in use" - not convenient to type
"us consumer protection versus firefox" - Stella award in progress ?

2007-05-01 19:02:19+0900

p.d.o, website | Comments Off on Google Keywords – Apr 2007

Bug triaging

A week ago, I started the long overdue task of triaging bugs reported on iceape/mozilla. When I started, the count was 364, duplicated excluded.

I'm still not halfway through the bug list, but already the bug count is down to 263, as of today. That is about a hundred bugs closed or merged. I can't believe there were still bugs so old but yet not merged to even older occurrences of the same bugs.

Since it is easier to appreciate the effort with nice colors, I started graphing the bug counts for iceape.

Sune Vuorela started some similar things for kde a while ago, so, thinking this could be useful for a whole lot more people than KDE and Mozilla maintainers, I also started graphing the bug counts for all packages. There is not enough backlog to have more than the last week, though. I'm a bit concerned about the fact that the RRD updates take a lot of time and may induce some load on gluck... Please DSA hit me if you want it to be moved elsewhere.

Anyways, back to iceape bugs, the oldest bug I closed was #80787, which was actually fixed as soon as I first uploaded xulrunner a year ago. On the other hand, the oldest I didn't close is #78654. There are some other winners that I'm amazed they've not been treated upstream in the last 6 years.

Now, while I tried to correctly and conveniently tag these oldest bugs, I didn't take the required amount of time to track them in upstream bugzilla. So if you, dear reader, have a little bit of time, I would really appreciate if you could go through the list of bugs* that are tagged upstream, but not yet forwarded and either find them in upstream bugzilla, or file them if they don't exist there.

* strangely enough, the raw=yes argument to the BTS doesn't seem to work properly. You can get a raw list (with no ordering by status or severity) through the LDAP gateway with the following command line:

ldapsearch -p 10101 -h bts2ldap.debian.net -x -b dc=current,dc=bugs,dc=debian,dc=org '(&(debbugsSourcePackage=iceape) (debbugsTag=upstream) (!(debbugsState=forwarded)) (!(debbugsState=done)))'

Note the LDAP query is probably not optimal. You can add debbugsID at the end of the command line if you're only interested in the bug numbers.

2007-04-30 20:03:00+0900

debian, iceape | 1 Comment »