[Sheepdog] VDI locking, migration and qemu

Sat Nov 28 16:35:46 CET 2009

On 11/27/2009 10:43 PM, Chris Webb wrote:
> Oops, I didn't know qemu-io existed; added that to the patch! I've updated
> qemu-img.c too. I also noticed that your bdrv_close_all() function didn't do
> things like close backing images correctly, so I've changed it to call
> bdrv_close() (which does do the right thing) and reindented in standard qemu
> style. Hope that's okay.

Thanks, it looks okay!

> We're currently not doing any locking in the read-only case, e.g. a backing
> image (except as a wrapper around bdrv_commit()). Is there a problem with
> one process accessing an image read-only while another accesses it
> read-write? If there is, we probably need to arrange to take an exclusive
> lock on read-write, and a shared lock on read-only so you can have multiple
> readers, but readers can't coexist with a writer.

Even read-only access is not allowed while another qemu is doing
write access. However, there is no problem about your patch, I guess.

It is because Sheepdog allow us to clone images only from
snapshot images. It means a backing image is always read-only.
If users specify writable Sheepdog VDI as a backing image,
qemu-img returns error.

> More generally, I'm a little bit concerned about stray locks. The claims
> persist until they are explicitly released, even if the connection from the
> qemu to the sheepdog cluster is terminated. This means that crashing qemu
> processes, dying hosts, etc. will always leave stale locks. I'm sure this
> will lead to a cluster maintenance nightmare, especially as qemu is still so
> sloppy about doing exit(1) all throughout the code whenever something
> happens that it doesn't like.
> 
> I appreciate that the sheepdog design means there isn't a single persistent
> connection which can be used to bound the lifetime of the lock, as you might
> have with (say) an NBD server. Maybe some sort of heartbeat contact with the
> qemu process should be required to keep the lock alive?

Yes, we must monitor whether VMs are alive or not to release locks
in any cases. It is included in our TODO items.

Currently, I am considering the following approach:
Sheepdog design make a qemu host machine be in the Sheepdog cluster,
so unexpected dying hosts can be detectable.
Therefore, If crashed qemu can be detected by a local cluster daemon,
we can monitor VMs properly.

Regards,

MORITA Kazutaka