Archive for December, 2006

Am I ?

I don't know if I should be proud of this...

Yeah, I know, I wrote it in the first place, but still...

2006-12-19 22:59:31+0900

me, p.d.o | Comments Off on Am I ?

Playing evil with VMWare ESX Server, part 2

I finally gave a try to my pervert solution.
Using unionfs led to a kernel oops on the service console (which means it brought VMs down, too).
funionfs, as a fuse filesystem, could not lead to the same result, but i had some issues with it:

  • result of lseek() being put in an int variable while _FILE_OFFSET_BITS is set to 64... impossible to seek at more than 2GB of data, which ext3 recovery needed to do...
  • when opening read/write a file (which the loopback device does), it attempts to open read/write the original file on the read-only directory, which leads to EBUSY when the file is opened elsewhere (i.e. by VMWare ESX Server).

Once these issues solved, it ... doesn't work. The problem is funionfs doesn't handle files that are partially written to: it creates a new file in the r/w directory, and writes exactly what has been requested, creating a sparse file. On subsequent reads, it gets the data from that sparse file. This means most of the data, except data that has been written, is zeroed.
I'm afraid there's nothing better to do than take a somewhat arbitrary chunk size, write to the sparse file by chunks (reading missing data from the original file if necessary), and keep a bitmap of the chunks that have been written to the new file...
I should check what the original unionfs does with this.

2006-12-19 22:51:29+0900

diskimgfs | 2 Comments »

Playing evil with VMWare ESX Server

During the past month, I've been working on migrating some servers onto a VMWare ESX Server. I did test VMWare Workstation a long time ago, and have had a VMWare Server on my work laptop since it has been freed (as in beer), but ESX Server is something else.

It seems to me, though, that except it can run unmodified OSes without VT/Pacifica technology, it is not technically superior to what you can do with Xen and other free software. For instance, it may be possible to do better virtual switching setups with "standard" linux bridges and ebtables. But from the administrator perspective ESX Server has the advantage of its administration console. I'm still waiting to see a nice and featureful free (as in speech) configuration software for Xen (or maybe, dear lazyweb, could you show me some good urls I missed).

We don't have the full VMWare Infrastructure, though, so I can't speak of VMotion or VMHA, but that sounds neat, on paper. I never tested Xen migration either, so, I won't say it's better :)

Anyways, VMWare ESX Server is a pretty good product, but there are quite a few quirks, or even really painful misfeatures:

  • Sometimes, the console shows more items than what the user is supposed to see. Though you still can't act on these, you wouldn't expect them to show up (for example, I sometimes see items that are only supposed to appear if you have a Virtual Center, which we don't have).
  • If you rename a virtual switch, you have to go through all the VMs that were connected to it to change their network configuration accordingly.
  • You now have to edit the settings to connect/disconnect the CD drive or the network. That used to be less annoying with the console software in version 2.5.
  • You can't display more than an hour of performance graphs without a Virtual Center. Pretty painful when you only have one ESX server. (also known as the "buy more of my products business plan")
  • VMFS doesn't maintain coherency between readdir() and stat(). The d_ino readdir() returns in its struct dirent and st_ino in stat()'s struct stat don't match. This is especially annoying with Legato Networker, which checks that coherency and doesn't save files that "changed inode". To circumvent this misfeature, I installed fuse and slightly modified libfuse and the fusexmp_fh example so that I can mount a mirror of /vmfs with coherent inodes. Now VM disks can be safely saved.
  • It's impossible to create a loopback device on a file residing on VMFS. The filesystem doesn't accept the LOOP_SET_FD request. This means that, while the VM disks files are basically raw disk images, you can't directly mount the filesystems on the service console with a loopback device. Again, with the modified fusexmp_fh program, this is now possible.
  • While there was a (quite broken, as in kernel freeze) way to mount filesystems from VM disks with ESX server 2.5 (which we also tested before upgrading to version 2) with vmware-mount, the only "official" way I found to do this with version 3 is to use vcbMount, which requires a VMWare Virtual Consolidated Backup server (not really free neither as in speech nor as in beer ; seems to be another instance of the "buy more of my products business plan"), and an extra server connected to the SAN.
  • ...

As said above, we can now setup a loopback device on the VM disk files (which needs a little trickery with offsets to get the partitions positions right, but that's not very hard). While it's possible to mount filesystems from an offline disk this way, it's not a good idea to mount an ext3 filesystem from a running VM, because the filesystem is flagged for recovery, and the service console kernel would want to replay the journal, which may have nasty side effects. I don't know for NTFS yet.

There may be a solution to "cleanly" mount filesystems from an online disk, though (not yet tested):

  • Snapshot the VM which the disk is on. After that, the -flat.vmdk file is frozen (only if it's the first snapshot).
  • Use an unionfs or funionfs (over fuse) to keep the real -flat.vmdk file readonly but still can write on it, so that the service console kernel can replay the journal on the writeable part of the unionfs.
  • Loopback mount partitions from the image in the unionfs.

That would be pretty pervert (ext3 over loopback over unionfs over vmfs), but should just work. I'll post a detailed procedure.

2006-12-16 00:15:55+0900

diskimgfs | 3 Comments »