Following my previous entry about playing around with LVM, LUKS and the device mapper, I documented myself about internals involved in pvmove, which is more of a challenge, considering there is no such documentation. I could find no useful documentation for either the device mapper or LVM. A bit of good old UTSL later, I could elaborate how to do what I wanted, and realized it was even possible to do in shell script.
So here you are : a shell script to transform an LVM physical volume to a LVM over LUKS physical volume,. I'll detail in between how it works. As the previous script, use at your own risk, it comes with no warranty.
Note you theorically can still access the filesystems underneath without problems. It's an in-place and live conversion. Also note this is more a proof of concept than a proper, risk-less and well-written solution.
set -e
dev=$1
luks=$(mktemp)
cryptdev=$(basename $dev)_crypt
pvsize=$(pvs -o pe_start,pv_size --units s --noheadings --nosuffix "$dev" | awk '{print $1 + $2}')
devsize=$(blockdev --getsz "$dev")
mchunk=8
The script takes the physical volume device to convert as an argument. Note there is no check for the validity of its value.
- luks will be the temporary file where we are going to create a temporary LUKS volume.
- pvsize is the full size of the LVM physical volume, i.e. the size of all the physical extents (pv_size) + the LVM headers and metadata (pe_start).
- devsize is the raw device size.
- mchunk is the size in blocks of chunks for the mirror target, see below.
dd of="$luks" seek=$devsize count=0 bs=512 2> /dev/null
luksdev=$(losetup -f)
losetup "$luksdev" "$luks"
trap "losetup -d \"$luksdev\"; rm -f \"$luks\"" EXIT
Next, we create a sparse luks file the same size as the device (in case luksFormat would use the size somehow, but I believe it doesn't), and a loopback device on this file. The trap is here to avoid leaving the loopback device and the file when an error occurs later (though during the conversion itself, it will be pointless).
cryptsetup luksFormat -q "$luksdev"
cryptsetup luksOpen "$luksdev" ${cryptdev}_real
read start length crypt format key IVoff cdev offset <<EOF
$(dmsetup table ${cryptdev}_real)
EOF
We create a LUKS device, so that we can get the encryption key ($key), and the size of the LUKS header ($offset). Note you need to add --showkey to the dmsetup table
command on sid.
if [ $(expr $devsize - $pvsize) -lt $offset ]; then
echo Not enough free space after LVM physical volume
cryptsetup luksClose ${cryptdev}_real
exit
fi
Check we have enough space after the LVM physical volume to offset everything by the size of the LUKS header. If not, you can still try again after you reduce the size of the LVM physical volume by an extent.
if [ $(expr $devsize % $offset % $mchunk) -gt 0 ]; then
echo Last
cryptsetup luksClose ${cryptdev}_real
exit
fi
This is another check to avoid surprises at the end, when dealing with the last chunk. As the script is written for the moment, it doesn't support cases where this last chunk is not a multiple of $mchunk. So we need to abort in these.
read major minor <<EOF
$(stat -t "$dev" | awk '{print $10,$11}')
EOF
maps=$(dmsetup deps | awk -F: "/\($major, $minor\)/{print \$1}")
dmsetup create $cryptdev <<EOF
0 $length linear $dev 0
EOF
dmsetup reload ${cryptdev}_real <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume ${cryptdev}_real
for map in ${maps}; do
dmsetup table "$map" | sed s,$major:$minor,/dev/mapper/$cryptdev, | dmsetup reload "$map"
dmsetup resume "$map"
done
Here, we create the dev_crypt device mapper that is our fake LUKS device, which starts as a simple linear mapper and will end as a complete crypt mapper. This fake LUKS device is inserted as an intermediate mapper between the LVM device mapper and the real device. So, the LVM device mapper will have this fake LUKS device as backend, and, at the beginning, the fake LUKS device maps linearly to the real device.
Note we look for all device mappers using the real device as backend before creating the fake LUKS device to avoid finding the fake LUKS device in the list.
Also note dmsetup reload
only loads a new table in the INACTIVE slot, and dmsetup resume
makes this inactive table LIVE.
cursor=$length
chunk=$offset
while [ $cursor -gt 0 ]; do
cursor=$(expr $cursor - $chunk)
if [ $cursor -lt 0 ]; then
chunk=$(expr $chunk + $cursor)
cursor=0
fi
(
[ $cursor -ne 0 ] && echo 0 $cursor linear $dev 0
echo $cursor $chunk mirror core 1 $mchunk 2 $dev $cursor /dev/mapper/${cryptdev}_real $cursor
[ $cursor -lt $(expr $length - $chunk) ] && echo $(expr $cursor + $chunk) $(expr $length - $cursor - $chunk) crypt $format $key $(expr $IVoff + $cursor + $chunk) $dev $(expr $offset + $cursor + $chunk)
) | dmsetup reload "$cryptdev"
dmsetup resume "$cryptdev"
chunks=$(expr $chunk / $mchunk)
while ! dmsetup status "$cryptdev" | grep "$chunks/$chunks"; do
true
done
done
This is where the main work is done : moving the data around. We actually just let the device mapper deal with the data duplication, $offset blocks by $offset blocks ($offset being the LUKS header size), using a mirror target for the chunk being moved. So our disk looks like the following:
We use the extra dev_crypt_real device (the previously remapped LUKS device) as the encryption backend for the mirror.
I haven't figured a better way to wait for the end of the mirroring than to do a loop checking with dmsetup status
, dmsetup wait
doesn't seem to be very helpful here.
Anyways, this is the part of the script where you don't want a crash to occur. Because if it does, all you can do is start on a rescue system, and try to find where the encrypted part of the disk start to setup a device mapper by hand.
And you'd better have the luks temporary file in a directory that is neither in RAM (think tmpfs) nor in the LVM you are converting (/tmp in a default etch install is, for instance ; note the script works nevertheless fine in this case). Also note the trap will remove the luks temporary file if the script exits...
dmsetup reload "$cryptdev" <<EOF
$start $length crypt $format $key $IVoff $dev $offset
EOF
dmsetup resume "$cryptdev"
dmsetup remove "${cryptdev}_real"
dd if="$luks" of=$dev count=$offset bs=512 2> /dev/null
Final steps of the conversion : our dev_crypt device becomes a full LUKS volume, so we can remove dev_crypt_real and add the LUKS headers at the top of the device we converted.
At this moment, the LUKS volume is setup just as if it had been setup by cryptsetup. For LVM to recognize the change properly, you need to run pvscan
. Once you ran it, you can do whatever you want with LVM.
Now, you may want to add the following to your /etc/crypttab
file:
$cryptdev $dev none luks
i.e. hda5_crypt /dev/hda5 none luks
if the device was /dev/hda5
.
And if the LVM volume you converted contains your root filesystem, you should run (for Debian systems):
update-initramfs -u
I tested this successfully under qemu and will give it a shot on my laptop some time soon.
Now, because I had a hard time not finding much about the mirror target of the device mapper, here is what I could gather about it. The target syntax is as follows:
<logical_start_sector> <num_sectors> mirror [ core | disk ] <num_params> <param> ... <num_mirrors> [ <destination > <start_sector> ] ...
logical_start_sector
, num_sectors
, destination
and start_sector
have the same meaning as in other targets.
core
and disk
are two different log types (to track differences between mirrors), respectively in memory and on disk.
num_params
and params
depend on the log type:
- for
core
, num_params
can be either 1 or 2.
The first parameter is region_size
, which is the size (in blocks) of the chunks the mirror log tracks for synchronization. It can't be less than the page size divided by 512, i.e. 8 on x86, must be a power of 2, and must not exceed num_sectors
. Apparently, it is not a problem if num_sectors
is not a multiple of region_size
.
The optional second parameter is either sync
or nosync
, meaning of which I'm not sure. I think it determines whether the mirror should do the initial synchronization (sync
) or not (nosync
).
- for
disk
, num_params
can be either 2 or 3, with params being log_device
(device where the logs are kept), region_size
, as above, and an optional sync
or nosync
, as above too.
num_mirrors
is the number of mirrors and for each mirror, we have a pair destination
and start_sector
.
[ Update: after a quick look at the device mapper source code in Linus's git tree, updated the mirror target description ]