[Sheepdog] Deleting snapshots
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Thu Oct 7 08:00:02 CEST 2010
At Thu, 7 Oct 2010 06:28:55 +0200,
Floris Bos wrote:
>
> Hi,
>
> On Thursday, October 07, 2010 05:13:40 am you wrote:
> > > Basically what I need is a new read-only snapshot for use by my client,
> > > and no changes to the current VDI.
> > > After all, the current vdi may be in use by qemu, and qemu is totally
> > > unaware of the snapshot I'm taking with my external program.
> > >
> > > So the original VDI ID must stay writable, as there is no way to signal
> > > qemu that it should start using another id.
> >
> > On second thought, we cannot avoid updating a vdi id when its snapshot
> > is created. It is because a sheepdog client does copy-on-write based
> > on its vdi id.
> >
> > So we need to use a savevm command from the qemu monitor to take a
> > snapshot of the running VM. Currently, if you want to create a
> > snapshot from the external program, you need to get a lock of the vdi
> > to avoid corrupting running VMs, and if running VMs exist, you need to
> > give up taking a snapshot...
> >
> > In future, I think we should implement a mechanism to notify the
> > running client that an external program creates a snapshot.
> >
> > For example, if write accesses to snapshot objects return something
> > like SD_RES_READONLY_OBJ, we can tell the client that it should update
> > the vdi id.
>
> So the VDI ID decides whether writes are done in-place or COW is used.
> Does this also mean that after taking a snapshot, all updates are done using
> COW, even if the only snapshot there was is deleted later?
>
Yes, exactly.
>
> In the typical use case of making a backup, the snapshot only exist for a
> couple minutes:
>
> 1) temporary read-only snapshot is made
> 2) rsync (or other legacy program) reads all the data from the snapshot, and
> sends it to the external backup server.
> 3) temporary snapshot is deleted again.
>
> If qemu continues to use COW for updates afterwards, I assume this affects
> performance, as a 4 MB object has to be read, updated, and written again, even
> if only a 512-byte sector is changed?
>
>
> Ideally there should be a way to signal the client to only use COW temporarily
> (while any snapshot exist), and signal it again that it can resume updating
> in-place after there are no longer any snapshots.
I think this kind of feature would be useful in practice. If a sheep
daemon could tell the virtual machine to use the previous vdi id, we
could achieve this feature easily, I think. In this case, write
accesses to the objects which were already updated during rsync are
done in copy-on-write way again, and other accesses are done in-place.
>
> Asynchronous notification might be relatively complicated to implement in the
> qemu block driver, though.
> Wonder if it might be more practical to transfer some of the low level stuff
> that is currently in the qemu client itself, to sheep.
> And let sheep offer a simplified protocol to the client, that does not think in
> low level details like which object the data should be written to, but just
> specifies "offset" and "data length" to write to in the image.
> So that sheep can decide whether or not cow should be used, and also manage
> other low level details (like updating inode metadata) instead of the client.
>
Yes, that might be more practical, but makes a sheep daemon more
complicated. Current sheepdog implementation provides something like
a simple object storage to the qemu block driver, but I think that
managing which object should be updated in COW way is out of scope of
the object storage.
I guess the notification feature is not so complicated. What the qemu
block driver has to do is only updating its vdi object when the driver
receives SD_RES_READONLY in the response of write operation,
Thanks,
Kazutaka
More information about the sheepdog
mailing list