[sheepdog] [PATCH 0/9] revive VDI locking mecahnism

Tue Jul 15 15:16:27 CEST 2014

At Tue, 15 Jul 2014 09:32:00 +0200,
Fabian Zimmermann wrote:
> 
> Hi,
> > Under the vdi locking scheme implemented in this patchset, VDI release
> > will be caused in the below 3 cases:
> >
> > 1. qemu explicitly releases its VDI
> > 2. qemu process dies
> > 3. sheep process dies (in this case, VDIs opened by qemu processes
> > which connect to the sheep process as a gateway will be released)
> >
> > On the second thought, the case 2 and 3 are not so good for
> > integrity of VDI data. Because in such cases, write requests for
> > objects of the VDIs can be stopped before completion. It can introduce
> > inconsistent state of the objects. For example, assume that before the
> > stopped write request, contents of the objects are A, A, A (3
> > replica). The sheep can die after the second write request is
> > issued. After that, replicas can be B, B, A. If new qemu process tries
> > to read the object, sheep can return both of B and A because sheep
> > issues read request to one randomly choosed node from 3 nodes. This
> > behavior breaks the semantics of block device!
> >
> > So I think it is safe to require "force unlocking + dog vdi check"
> > before launching new qemu for the cases of sudden death of qemu and
> > node leaving. How do you think?
> first, thanks a lot for the work. I'm really glad to have this feature
> in the near future.
> 
> I think it "force unlocking + dog vdi check" is fine, but may I add
> another question:
> 
> When is a write-cmd returned? As soon as a quorum is acked by
> sheepdog-processes?

If there's no error, write request returns when all replicas are
written.

> 
> I'm asking because if replicas are B, A, A and clients thinks "B" is
> written. vdi-check would (correct me if I'm wrong) assume A as correct
> and overwrite the B-data, so ideally client shouldn't have got his
> write-cmd back, else this would lead to an inconsistent state, isn't
> it?

If the inconsistent state is caused (e.g. the above B, A, A), it means
write sequence is interrupted by error (e.g. node leave by network
error). So QEMU receives an error from sheep and cannot issue more I/O
requests to the VDI (it is a situation similar to unplugging HDD from
a running machine). In such a case, admins would kill QEMU processes
so the killed QEMU will never see the inconsistent state.

The problem is a new QEMU process which grabs the VDI used by the old
QEMU process. The new one can see the above inconsistent state.

As Valerio points, this problem is not only related to the VDI locking
feature. For avoiding this problem, we need to run "dog vdi check"
after sudden death of QEMU process or node.

Thanks,
Hitoshi