[sheepdog] [PATCH 2/2] sheepdog: support 'qemu-img snapshot -a'
Kevin Wolf
kwolf at redhat.com
Fri Jun 7 18:48:26 CEST 2013
Am 07.06.2013 um 18:14 hat Liu Yuan geschrieben:
> On 06/07/2013 11:22 PM, Kevin Wolf wrote:
> > Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben:
> >> On 06/07/2013 03:31 PM, Kevin Wolf wrote:
> >>> Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben:
> >>>> On 06/06/2013 08:46 PM, Kevin Wolf wrote:
> >>>>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben:
> >>>> Only when the write comes from VM, we do the following stuff
> >>>> - delete active vdi A
> >>>> - created a new inode based on the previously reloaded As1's inode
> >>>
> >>> Thanks, this is the part that I missed.
> >>>
> >>> I'm not sure however why the actual switch is delayed until the first
> >>> write. This seems inconsistent with qcow2 snapshots.
> >>>
> >>> Do you know if there is a reason why we can't always do this already
> >>> during bdrv_snapshot_goto()?
> >>>
> >>
> >> I think the reason is sd_load_vmstate() need to load vm state objects
> >> with the correct inode object.
> >>
> >> I tried to remove
> >>
> >> if (!s->inode.vm_state_size)
> >>
> >> and make sd_create_branch unconditional. This means 'loadvm' command
> >> will try to call sd_create_branch() inside sd_snapshot_goto(). But
> >> failed with reading the wrong snapshot because the vdi's inode object is
> >> changed by sd_create_branch().
> >
> > Ok, I think I start to understand how these things work. Basically,
> > qemu's savevm functionality is designed to work with qcow2 like this:
> >
> > 1. Save the VM state to the active layer
> > 2. create_snapshot saves the active layer including VM state
> > 3. [ Forget to remove the VM state from the active layer :-) ]
> > 4. loadvm loads the snapshot again, including VM state
> > 5. VM state is restored from the active layer
> >
> > So for Sheepdog, the problem is that step 2 doesn't include the VM state,
> > right? So our options are either to change Sheepdog so that step 2 does
> > involve moving the VM state (might end up rather ugly) or you need to
> > swap steps 1 and 2, so that you first switch to the new snapshot and
> > then write the VM state.
> >
>
> Sorry, I didn't fully understand the above example. If sheepdog takes
> snapshots, vmstate will go with the exact current active disk into the
> cluster storage, for e.g
>
> 1 we take two snapshots on the disk 'test' with tag A and B respectively
> 2 disk(A) + vmstate(A) will be stored as indexed by A's vdi
> disk(B) + vmstate(B) will be stored as indexed by B's vdi
>
> The chain is A --> B --> C, C is the current active vdi. Note, both A, B
> and C has different vdi_id.
>
> The problem show up when we do loadvm A, the process is:
>
> 1 reload inode of A (in snapshot_goto)
> 2 read vmstate via A's vdi_id (loadvm_state)
> 3 delete C and create C1, reload inode of C1 (sd_create_branch on write)
>
> So if at stage 1, we call sd_create_branch(), the inode will point to C1
> and stage 2 will fail to read vmstate because vdi_id is C1's.
>
> This is why we rely on the write to call sd_create_branch(). I am not
> sure if we can fix sheepdog's loadvm without touching the core code of QEMU.
Hm, yes, I was confused. :-)
Anyway, the options stay the same: Either C1 must somehow inherit the VM
state from A on the Sheepdog level, or we must make sure to get this
order:
1. Switch to (read-only) snapshot A
2. Load the VM state
3. Create (writable) snapshot C1 and switch to it
This is a different order as required for qcow2 (where C1 would inherit
the VM state from A, so our current first 1, then 3, then 2 works), but
if we can't fix it on the Sheepdog side, we'll have to touch the core
code of qemu. I don't like much to touch the block.c code for it, but if
it's required to fix a bug, let's just do it.
> > Because I think what we're doing currently is not only hard to
> > understand, but actually wrong. Consider this:
> >
> > 1. savevm with a Sheepdog image
> > 2. Exit qemu before any write request is sent
> > 3. Restart qemu with the Sheepdog image and notice how the wrong
> > snapshot is active. Oops.
> >
>
> Sheepdog indeed has this problem.
Thanks for confirming.
Kevin
More information about the sheepdog
mailing list