[sheepdog] Is it necessary for outstanding io block leave/join event?
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Thu May 17 09:29:44 CEST 2012
At Thu, 17 May 2012 09:56:15 +0800,
Liu Yuan wrote:
>
> On 05/17/2012 01:57 AM, MORITA Kazutaka wrote:
>
> > At Wed, 16 May 2012 13:27:31 -0400,
> > Christoph Hellwig wrote:
> >>
> >> On Thu, May 17, 2012 at 01:33:14AM +0900, MORITA Kazutaka wrote:
> >>>> I think the rational is to not change the cluster configuration while
> >>>> I/O is in progress. With the vnode_info structure making the cluster
> >>>> state during and I/O operation explicit (together with hdr->epoch)
> >>>> I suspect this isn't needed any more, but I want to do a full blown
> >>>> audit of the I/O path first.
> >>>
> >>> The reason is that the recovery algorithm assumes that all objects in
> >>> the older epoch are immutable, which means only the objects in the
> >>> current epoch are writable. If outstanding I/Os update objects in the
> >>> previous epoch after they are recovered to the current epoch, their
> >>> replicas result in inconsistent state.
> >>
>
>
> This assumption seems not necessary, at least to Farm, where I/O will
> always be routed into objects in the working directory.
Really? I thought that this problem does not depend on the underlying
storage driver.
If there are 1 node, A, and the number of copies is 1, how does
Farm handle the following case?
- the user add the second node B, and there is in-flight I/Os on
node A
- the node A increments the epoch from 1 to 2, and the node B recovers
objects from epoch 1 on node A
- after node B receives objects to epoch 2, the in-flight I/Os on
node A updates objects in epoch 1 on node A.
- node A sends responses to clients as success, but the updated data
will be lost??
> I think both recovery code and any assumptions need to be revisited and
> possibly this is a long term issue.
Agreed.
Thanks,
Kazutaka
More information about the sheepdog
mailing list