Archive for the 'diskimgfs' Category

VMDK open specification

Dear lazyweb,

A while ago, there was a PR announcing that VMware was opening its VMDK specification with GPL compatible licensing. While anyone can get the specification document, provided they fill a form, I haven’t seen any note, on the document itself or anywhere else than the PR that this indeed allows an open source implementation.

The only thing that remotely ressembles licensing of the document could be

To ensure that readers of this specification have access to the most current version, readers may download copies of this specification from and no part of this specification (whether in hardcopy or electronic form) may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of VMware, Inc., except as otherwise permitted under copyright law. Please note that the content in this specification is protected under copyright law even if it is not distributed with software that includes an end user license agreement. This specification and the information contained herein is provided on an “AS-IS” basis, is subject to change without notice, and to the maximum extent permitted by applicable law (…)

But this is more about the document itself than about the use of the specification. IANAL, but without more terms, the point of being able to read this specification is moot. It could even be patented and enforced…

Anyone with a better clue ?

2007-02-17 10:14:16+0900

diskimgfs | 2 Comments »

Playing evil with VMWare ESX Server, part 3

For a reminder of the situation, see also:

I finally had time at work to implement a solution for the problem. The result is a bit less than 400 lines of code, which I hope to be able to make free-as-speech. I’ll probably have tons of paperwork to do with my employer before that can happen…

The program implements a fuse filesystem that you feed with a raw device image file (but I’m willing to implement more image file formats), and that shows the individual partitions as separate files, named “partn.fstype“. These files can then be mounted via loopback devices the standard way (which was not possible on vmfs for the reason you can find in my first post on the subject), with the benefit of not requiring offset adjustments (as when you want to mount partitions from a disk image), or some loopback device hack.

Additionally, it has an internal Copy-On-Write mechanism so that it is possible to mount “dirty” ext3 filesystems (e.g. a snapshot of a mounted filesystem) without modifying the original disk image. Note that there is no way, yet, to keep these writes after unmounting the fuse filesystem.

It uses libparted to handle the partition table reading, which means it will read any disk label types parted supports, such as BSD labels, sun partition tables, and so on. It doesn’t support LVM, though.

Unfortunately, VMWare ESX server 3.0 being based on RedHat Enterprise Linux 3, only an ancient version of libparted is available. By the way, this version (1.6.3) had a pretty bad bug that made it impossible to use it for regular files instead of devices: it was trying to do a BLKGETSIZE (or was it HDIO_GETGEO ?) ioctl on it. To workaround this, I implemented my own PedArchitecture. It was somehow a revelation, because with similar mechanism, I can implement support for different disk image types :).

Anyways, there have been quite some API changes in-between, so some updates will have to be done to the code…

In other news, I started working on ext3rminator again. There may be a new release in a few weeks.

Update: Thinking again about the ioctl issue, I’m wondering if vmfs couldn’t be responsible (again) for the failure, actually… I’ll check that on monday.

2007-01-12 21:58:39+0900

diskimgfs, ext3rminator | 1 Comment »

Playing evil with VMWare ESX Server, part 2

I finally gave a try to my pervert solution. Using unionfs led to a kernel oops on the service console (which means it brought VMs down, too). funionfs, as a fuse filesystem, could not lead to the same result, but i had some issues with it:

  • result of lseek() being put in an int variable while _FILE_OFFSET_BITS is set to 64… impossible to seek at more than 2GB of data, which ext3 recovery needed to do…
  • when opening read/write a file (which the loopback device does), it attempts to open read/write the original file on the read-only directory, which leads to EBUSY when the file is opened elsewhere (i.e. by VMWare ESX Server).

Once these issues solved, it … doesn’t work. The problem is funionfs doesn’t handle files that are partially written to: it creates a new file in the r/w directory, and writes exactly what has been requested, creating a sparse file. On subsequent reads, it gets the data from that sparse file. This means most of the data, except data that has been written, is zeroed. I’m afraid there’s nothing better to do than take a somewhat arbitrary chunk size, write to the sparse file by chunks (reading missing data from the original file if necessary), and keep a bitmap of the chunks that have been written to the new file… I should check what the original unionfs does with this.

2006-12-19 22:51:29+0900

diskimgfs | 2 Comments »

Playing evil with VMWare ESX Server

During the past month, I’ve been working on migrating some servers onto a VMWare ESX Server. I did test VMWare Workstation a long time ago, and have had a VMWare Server on my work laptop since it has been freed (as in beer), but ESX Server is something else.

It seems to me, though, that except it can run unmodified OSes without VT/Pacifica technology, it is not technically superior to what you can do with Xen and other free software. For instance, it may be possible to do better virtual switching setups with “standard” linux bridges and ebtables. But from the administrator perspective ESX Server has the advantage of its administration console. I’m still waiting to see a nice and featureful free (as in speech) configuration software for Xen (or maybe, dear lazyweb, could you show me some good urls I missed).

We don’t have the full VMWare Infrastructure, though, so I can’t speak of VMotion or VMHA, but that sounds neat, on paper. I never tested Xen migration either, so, I won’t say it’s better :)

Anyways, VMWare ESX Server is a pretty good product, but there are quite a few quirks, or even really painful misfeatures:

  • Sometimes, the console shows more items than what the user is supposed to see. Though you still can’t act on these, you wouldn’t expect them to show up (for example, I sometimes see items that are only supposed to appear if you have a Virtual Center, which we don’t have).
  • If you rename a virtual switch, you have to go through all the VMs that were connected to it to change their network configuration accordingly.
  • You now have to edit the settings to connect/disconnect the CD drive or the network. That used to be less annoying with the console software in version 2.5.
  • You can’t display more than an hour of performance graphs without a Virtual Center. Pretty painful when you only have one ESX server. (also known as the “buy more of my products business plan”)
  • VMFS doesn’t maintain coherency between readdir() and stat(). The d_ino readdir() returns in its struct dirent and st_ino in stat()’s struct stat don’t match. This is especially annoying with Legato Networker, which checks that coherency and doesn’t save files that “changed inode”. To circumvent this misfeature, I installed fuse and slightly modified libfuse and the fusexmp_fh example so that I can mount a mirror of /vmfs with coherent inodes. Now VM disks can be safely saved.
  • It’s impossible to create a loopback device on a file residing on VMFS. The filesystem doesn’t accept the LOOP_SET_FD request. This means that, while the VM disks files are basically raw disk images, you can’t directly mount the filesystems on the service console with a loopback device. Again, with the modified fusexmp_fh program, this is now possible.
  • While there was a (quite broken, as in kernel freeze) way to mount filesystems from VM disks with ESX server 2.5 (which we also tested before upgrading to version 2) with vmware-mount, the only “official” way I found to do this with version 3 is to use vcbMount, which requires a VMWare Virtual Consolidated Backup server (not really free neither as in speech nor as in beer ; seems to be another instance of the “buy more of my products business plan”), and an extra server connected to the SAN.

As said above, we can now setup a loopback device on the VM disk files (which needs a little trickery with offsets to get the partitions positions right, but that’s not very hard). While it’s possible to mount filesystems from an offline disk this way, it’s not a good idea to mount an ext3 filesystem from a running VM, because the filesystem is flagged for recovery, and the service console kernel would want to replay the journal, which may have nasty side effects. I don’t know for NTFS yet.

There may be a solution to “cleanly” mount filesystems from an online disk, though (not yet tested):

  • Snapshot the VM which the disk is on. After that, the -flat.vmdk file is frozen (only if it’s the first snapshot).
  • Use an unionfs or funionfs (over fuse) to keep the real -flat.vmdk file readonly but still can write on it, so that the service console kernel can replay the journal on the writeable part of the unionfs.
  • Loopback mount partitions from the image in the unionfs.

That would be pretty pervert (ext3 over loopback over unionfs over vmfs), but should just work. I’ll post a detailed procedure.

2006-12-16 00:15:55+0900

diskimgfs | 3 Comments »