Playing more with LVM, LUKS and the device mapper

Following my previous entry about playing around with LVM, LUKS and the device mapper, I documented myself about internals involved in pvmove, which is more of a challenge, considering there is no such documentation. I could find no useful documentation for either the device mapper or LVM. A bit of good old UTSL later, I could elaborate how to do what I wanted, and realized it was even possible to do in shell script.

So here you are : a shell script to transform an LVM physical volume to a LVM over LUKS physical volume,. I'll detail in between how it works. As the previous script, use at your own risk, it comes with no warranty.
Note you theorically can still access the filesystems underneath without problems. It's an in-place and live conversion. Also note this is more a proof of concept than a proper, risk-less and well-written solution.

set -e
dev=$1
luks=$(mktemp)
cryptdev=$(basename $dev)_crypt
pvsize=$(pvs -o pe_start,pv_size --units s --noheadings --nosuffix "$dev" | awk '{print $1 + $2}')
devsize=$(blockdev --getsz "$dev")
mchunk=8

The script takes the physical volume device to convert as an argument. Note there is no check for the validity of its value.

  • luks will be the temporary file where we are going to create a temporary LUKS volume.
  • pvsize is the full size of the LVM physical volume, i.e. the size of all the physical extents (pv_size) + the LVM headers and metadata (pe_start).
  • devsize is the raw device size.
  • mchunk is the size in blocks of chunks for the mirror target, see below.

dd of="$luks" seek=$devsize count=0 bs=512 2> /dev/null
luksdev=$(losetup -f)
losetup "$luksdev" "$luks"
trap "losetup -d \"$luksdev\"; rm -f \"$luks\"" EXIT

Next, we create a sparse luks file the same size as the device (in case luksFormat would use the size somehow, but I believe it doesn't), and a loopback device on this file. The trap is here to avoid leaving the loopback device and the file when an error occurs later (though during the conversion itself, it will be pointless).

cryptsetup luksFormat -q "$luksdev"
cryptsetup luksOpen "$luksdev" ${cryptdev}_real
read start length crypt format key IVoff cdev offset <<EOF
$(dmsetup table ${cryptdev}_real)
EOF

We create a LUKS device, so that we can get the encryption key ($key), and the size of the LUKS header ($offset). Note you need to add --showkey to the dmsetup table command on sid.

if [ $(expr $devsize - $pvsize) -lt $offset ]; then
  echo Not enough free space after LVM physical volume
  cryptsetup luksClose ${cryptdev}_real
  exit
fi

Check we have enough space after the LVM physical volume to offset everything by the size of the LUKS header. If not, you can still try again after you reduce the size of the LVM physical volume by an extent.

if [ $(expr $devsize % $offset % $mchunk) -gt 0 ]; then
  echo Last
  cryptsetup luksClose ${cryptdev}_real
  exit
fi

This is another check to avoid surprises at the end, when dealing with the last chunk. As the script is written for the moment, it doesn't support cases where this last chunk is not a multiple of $mchunk. So we need to abort in these.

read major minor <<EOF
$(stat -t "$dev" | awk '{print $10,$11}')
EOF
maps=$(dmsetup deps | awk -F: "/\($major, $minor\)/{print \$1}")
dmsetup create $cryptdev <<EOF
0 $length linear $dev 0
EOF
dmsetup reload ${cryptdev}_real <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume ${cryptdev}_real
for map in ${maps}; do
  dmsetup table "$map" | sed s,$major:$minor,/dev/mapper/$cryptdev, | dmsetup reload "$map"
  dmsetup resume "$map"
done

Here, we create the dev_crypt device mapper that is our fake LUKS device, which starts as a simple linear mapper and will end as a complete crypt mapper. This fake LUKS device is inserted as an intermediate mapper between the LVM device mapper and the real device. So, the LVM device mapper will have this fake LUKS device as backend, and, at the beginning, the fake LUKS device maps linearly to the real device.

Note we look for all device mappers using the real device as backend before creating the fake LUKS device to avoid finding the fake LUKS device in the list.

Also note dmsetup reload only loads a new table in the INACTIVE slot, and dmsetup resume makes this inactive table LIVE.

cursor=$length
chunk=$offset
while [ $cursor -gt 0 ]; do
  cursor=$(expr $cursor - $chunk)
  if [ $cursor -lt 0 ]; then
    chunk=$(expr $chunk + $cursor)
    cursor=0
  fi
  (
  [ $cursor -ne 0 ] && echo 0 $cursor linear $dev 0
  echo $cursor $chunk mirror core 1 $mchunk 2 $dev $cursor /dev/mapper/${cryptdev}_real $cursor
  [ $cursor -lt $(expr $length - $chunk) ] && echo $(expr $cursor + $chunk) $(expr $length - $cursor - $chunk) crypt $format $key $(expr $IVoff + $cursor + $chunk) $dev $(expr $offset + $cursor + $chunk)
  ) | dmsetup reload "$cryptdev"
  dmsetup resume "$cryptdev"
  chunks=$(expr $chunk / $mchunk)
  while ! dmsetup status "$cryptdev" | grep "$chunks/$chunks"; do
    true
  done
done

This is where the main work is done : moving the data around. We actually just let the device mapper deal with the data duplication, $offset blocks by $offset blocks ($offset being the LUKS header size), using a mirror target for the chunk being moved. So our disk looks like the following:

We use the extra dev_crypt_real device (the previously remapped LUKS device) as the encryption backend for the mirror.
I haven't figured a better way to wait for the end of the mirroring than to do a loop checking with dmsetup status, dmsetup wait doesn't seem to be very helpful here.
Anyways, this is the part of the script where you don't want a crash to occur. Because if it does, all you can do is start on a rescue system, and try to find where the encrypted part of the disk start to setup a device mapper by hand.
And you'd better have the luks temporary file in a directory that is neither in RAM (think tmpfs) nor in the LVM you are converting (/tmp in a default etch install is, for instance ; note the script works nevertheless fine in this case). Also note the trap will remove the luks temporary file if the script exits...

dmsetup reload "$cryptdev" <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume "$cryptdev"
dmsetup remove "${cryptdev}_real"
dd if="$luks" of=$dev count=$offset bs=512 2> /dev/null

Final steps of the conversion : our dev_crypt device becomes a full LUKS volume, so we can remove dev_crypt_real and add the LUKS headers at the top of the device we converted.

At this moment, the LUKS volume is setup just as if it had been setup by cryptsetup. For LVM to recognize the change properly, you need to run pvscan. Once you ran it, you can do whatever you want with LVM.

Now, you may want to add the following to your /etc/crypttab file:

$cryptdev $dev none luks

i.e. hda5_crypt /dev/hda5 none luks if the device was /dev/hda5.

And if the LVM volume you converted contains your root filesystem, you should run (for Debian systems):

update-initramfs -u

I tested this successfully under qemu and will give it a shot on my laptop some time soon.

Now, because I had a hard time not finding much about the mirror target of the device mapper, here is what I could gather about it. The target syntax is as follows:

<logical_start_sector> <num_sectors> mirror [ core | disk ] <num_params> <param> ... <num_mirrors> [ <destination > <start_sector> ] ...

logical_start_sector, num_sectors, destination and start_sector have the same meaning as in other targets.
core and disk are two different log types (to track differences between mirrors), respectively in memory and on disk.
num_params and params depend on the log type:

  • for core, num_params can be either 1 or 2.
    The first parameter is region_size, which is the size (in blocks) of the chunks the mirror log tracks for synchronization. It can't be less than the page size divided by 512, i.e. 8 on x86, must be a power of 2, and must not exceed num_sectors. Apparently, it is not a problem if num_sectors is not a multiple of region_size.
    The optional second parameter is either sync or nosync, meaning of which I'm not sure. I think it determines whether the mirror should do the initial synchronization (sync) or not (nosync).
  • for disk, num_params can be either 2 or 3, with params being log_device (device where the logs are kept), region_size, as above, and an optional sync or nosync, as above too.

num_mirrors is the number of mirrors and for each mirror, we have a pair destination and start_sector.

[ Update: after a quick look at the device mapper source code in Linus's git tree, updated the mirror target description ]

2007-06-24 12:27:21+0900

miscellaneous, p.d.o

Both comments and pings are currently closed.

Comments are closed.