[Sheepdog] Drive snapshots and metadata

Fri Feb 4 16:35:31 CET 2011

Hi. I'm looking at both Sheepdog and Ceph at the moment, and thinking about
future directions for our hosting product. We run qemu-kvm virtual machines
backed by LVM2 logical volumes as virtual drives, accessed either locally or
over iscsi. I'm thinking of migrating in time to a distributed block store
like Sheepdog or Ceph's rbd, and have a handful of questions which have come
up while experimenting.

The operation I would really like to be able to export to users (in addition
to what we have already in our lvm2-based system) is an ability to make
copy-on-write clones of virtual hard drives. I can create a snapshot of the
source with qemu-img snapshot, and then do

  qemu-img create -b sheepdog:source:1 sheepdog:dest

However, I think that I can't then delete the snapshot source:1 and the
original source drive without also deleting the dest drive? Am I right about
this, or am I misunderstanding or out-of-date with the current state of
sheepdog?

Something else I'm contemplating is storage of metadata associated with
virtual drives, e.g. which user it belongs to, the user-provided drive name,
and other management layer properties on the drive. Is there a way I can tag
vdis in Sheepdog with a few short keys and values? (I know I could construct
a separate simple distributed database for this on top of the same corosync
backend as Sheepdog uses, but I'd like to avoid this if additional metadata
would naturally fit within Sheepdog as the total amount of metadata I'm
looking to store is very tiny!)

Finally, I see the rather intrusive qemu patch I contributed in the early
days of sheepdog to allow locking and live-migration to coexist has been
superseded by the total removal of the sheepdog locking requirement in
fe14318e31d8. This is a much nicer solution to the problem than mine! Out of
interest, what happens if several clients do access a vdi at the same time?
Is it identical behaviour to accessing (say) an iscsi block device from 2
hosts, e.g. cluster filesystems can be made to work, or are there weaker
ordering guarantees on the sequencing of writes and/or problems with
read-cache consistency that make it less useful?

Best wishes,

Chris.