[sheepdog] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a'

Fri Jun 7 17:22:46 CEST 2013

Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben:
> On 06/07/2013 03:31 PM, Kevin Wolf wrote:
> > Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben:
> >> On 06/06/2013 08:46 PM, Kevin Wolf wrote:
> >>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben:
> >> Only when the write comes from VM, we do the following stuff
> >>  - delete active vdi A
> >>  - created a new inode based on the previously reloaded As1's inode
> > 
> > Thanks, this is the part that I missed.
> > 
> > I'm not sure however why the actual switch is delayed until the first
> > write. This seems inconsistent with qcow2 snapshots.
> > 
> > Do you know if there is a reason why we can't always do this already
> > during bdrv_snapshot_goto()?
> > 
> 
> I think the reason is sd_load_vmstate() need to load vm state objects
> with the correct inode object.
> 
> I tried to remove
> 
>   if (!s->inode.vm_state_size)
> 
> and make sd_create_branch unconditional. This means 'loadvm' command
> will try to call sd_create_branch() inside sd_snapshot_goto(). But
> failed with reading the wrong snapshot because the vdi's inode object is
> changed by sd_create_branch().

Ok, I think I start to understand how these things work. Basically,
qemu's savevm functionality is designed to work with qcow2 like this:

1. Save the VM state to the active layer
2. create_snapshot saves the active layer including VM state
3. [ Forget to remove the VM state from the active layer :-) ]
4. loadvm loads the snapshot again, including VM state
5. VM state is restored from the active layer

So for Sheepdog, the problem is that step 2 doesn't include the VM state,
right? So our options are either to change Sheepdog so that step 2 does
involve moving the VM state (might end up rather ugly) or you need to
swap steps 1 and 2, so that you first switch to the new snapshot and
then write the VM state.

Because I think what we're doing currently is not only hard to
understand, but actually wrong. Consider this:

1. savevm with a Sheepdog image
2. Exit qemu before any write request is sent
3. Restart qemu with the Sheepdog image and notice how the wrong
   snapshot is active. Oops.

Or doesn't Sheepdog have any notion of an "active snapshot" and this is
only a runtime decision?

> >> The chain will look like:
> >>
> >> As1 --> As2
> >>  |
> >>  V
> >>  A
> >>
> >> This is how sheepdog handles savevm/loadvm.
> >>
> >> So for 'qemu-img snapshot -a', we should create the branch in the
> >> .bdrv_snapshot_goto.
> >>
> >> As you pointed out, we need to consider vm state even for 'snapshot -a',
> >> so I need to rework the patch 2/2.
> > 
> > Yes, the presence of VM state is independent from whether you're using
> > qemu-img or loadvm. And it actually goes both ways: qemu-img can revert
> > to snapshots that have a VM state, and loadvm can be used with images
> > that don't have a VM state (if you have multiple images, only one of
> > them has the VM state).
> > 
> 
> Seems not true of current code. If I 'loadvm' a snapshot without a
> vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot.
> Revert to it offline using qemu-img'.

Try again with multiple disks. Only one disk gets the VM state, the
other ones get only the disk snapshot.

> But 'qemu-img snapshot -a' works as you said, it can rollback to the
> snapshot regardless of vmstate.
> 
> Also this is a difference to store vmstate for sheepdog images. *Every*
> snapshot image can have its own vmstate stored in sheepdog cluster. That
> is, we can have multiple snapshot with its own private vmstate for sheepdog.

Sorry, can't parse. :-(

> I think my patch did the correct thing, just rollback the disk state of
> the snapshot for 'qemu-img snapshot -a'. Anyway, I found a minor issue
> of 2/2 patch, so I'll resend the set.

I think I agree that the problem is less with your patch, but more with
the preexisting code. We should get it sorted out anyway.

Kevin