[Sheepdog] Deleting snapshots

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Oct 7 08:00:02 CEST 2010


At Thu, 7 Oct 2010 06:28:55 +0200,
Floris Bos wrote:
> 
> Hi,
> 
> On Thursday, October 07, 2010 05:13:40 am you wrote:
> > > Basically what I need is a new read-only snapshot for use by my client,
> > > and no changes to the current VDI.
> > > After all, the current vdi may be in use by qemu, and qemu is totally
> > > unaware of the snapshot I'm taking with my external program.
> > > 
> > > So the original VDI ID must stay writable, as there is no way to signal
> > > qemu that it should start using another id.
> > 
> > On second thought, we cannot avoid updating a vdi id when its snapshot
> > is created.  It is because a sheepdog client does copy-on-write based
> > on its vdi id.
> > 
> > So we need to use a savevm command from the qemu monitor to take a
> > snapshot of the running VM.  Currently, if you want to create a
> > snapshot from the external program, you need to get a lock of the vdi
> > to avoid corrupting running VMs, and if running VMs exist, you need to
> > give up taking a snapshot...
> > 
> > In future, I think we should implement a mechanism to notify the
> > running client that an external program creates a snapshot.
> > 
> > For example, if write accesses to snapshot objects return something
> > like SD_RES_READONLY_OBJ, we can tell the client that it should update
> > the vdi id.
> 
> So the VDI ID decides whether writes are done in-place or COW is used.
> Does this also mean that after taking a snapshot, all updates are done using 
> COW, even if the only snapshot there was is deleted later?
> 

Yes, exactly.

> 
> In the typical use case of making a backup, the snapshot only exist for a 
> couple minutes:
> 
> 1) temporary read-only snapshot is made
> 2) rsync (or other legacy program) reads all the data from the snapshot, and 
> sends it to the external backup server.
> 3) temporary snapshot is deleted again.
> 
> If qemu continues to use COW for updates afterwards, I assume this affects 
> performance, as a 4 MB object has to be read, updated, and written again, even 
> if only a 512-byte sector is changed?
> 
> 
> Ideally there should be a way to signal the client to only use COW temporarily 
> (while any snapshot exist), and signal it again that it can resume updating 
> in-place after there are no longer any snapshots.

I think this kind of feature would be useful in practice.  If a sheep
daemon could tell the virtual machine to use the previous vdi id, we
could achieve this feature easily, I think.  In this case, write
accesses to the objects which were already updated during rsync are
done in copy-on-write way again, and other accesses are done in-place.

> 
> Asynchronous notification might be relatively complicated to implement in the 
> qemu block driver, though.
> Wonder if it might be more practical to transfer some of the low level stuff 
> that is currently in the qemu client itself, to sheep.
> And let sheep offer a simplified protocol to the client, that does not think in 
> low level details like which object the data should be written to, but just 
> specifies "offset" and "data length" to write to in the image.
> So that sheep can decide whether or not cow should be used, and also manage 
> other low level details (like updating inode metadata) instead of the client.
> 

Yes, that might be more practical, but makes a sheep daemon more
complicated.  Current sheepdog implementation provides something like
a simple object storage to the qemu block driver, but I think that
managing which object should be updated in COW way is out of scope of
the object storage.

I guess the notification feature is not so complicated.  What the qemu
block driver has to do is only updating its vdi object when the driver
receives SD_RES_READONLY in the response of write operation,


Thanks,

Kazutaka



More information about the sheepdog mailing list