[Sheepdog] VDI locking, migration and qemu

Fri Nov 27 14:43:52 CET 2009

MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes:

> How about qemu-img/qemu-io?
> These utilities also write data to VM images.
> bdrv_claim/release need to be added to qemu-img.c and qemu-io.c, right?

Oops, I didn't know qemu-io existed; added that to the patch! I've updated
qemu-img.c too. I also noticed that your bdrv_close_all() function didn't do
things like close backing images correctly, so I've changed it to call
bdrv_close() (which does do the right thing) and reindented in standard qemu
style. Hope that's okay.

We're currently not doing any locking in the read-only case, e.g. a backing
image (except as a wrapper around bdrv_commit()). Is there a problem with
one process accessing an image read-only while another accesses it
read-write? If there is, we probably need to arrange to take an exclusive
lock on read-write, and a shared lock on read-only so you can have multiple
readers, but readers can't coexist with a writer.

More generally, I'm a little bit concerned about stray locks. The claims
persist until they are explicitly released, even if the connection from the
qemu to the sheepdog cluster is terminated. This means that crashing qemu
processes, dying hosts, etc. will always leave stale locks. I'm sure this
will lead to a cluster maintenance nightmare, especially as qemu is still so
sloppy about doing exit(1) all throughout the code whenever something
happens that it doesn't like.

I appreciate that the sheepdog design means there isn't a single persistent
connection which can be used to bound the lifetime of the lock, as you might
have with (say) an NBD server. Maybe some sort of heartbeat contact with the
qemu process should be required to keep the lock alive?

Cheers,

Chris.