[sheepdog] Is it necessary for outstanding io block leave/join event?

Wed May 16 19:27:31 CEST 2012

On Thu, May 17, 2012 at 01:33:14AM +0900, MORITA Kazutaka wrote:
> > I think the rational is to not change the cluster configuration while
> > I/O is in progress.  With the vnode_info structure making the cluster
> > state during and I/O operation explicit (together with hdr->epoch)
> > I suspect this isn't needed any more, but I want to do a full blown
> > audit of the I/O path first.
> 
> The reason is that the recovery algorithm assumes that all objects in
> the older epoch are immutable, which means only the objects in the
> current epoch are writable.  If outstanding I/Os update objects in the
> previous epoch after they are recovered to the current epoch, their
> replicas result in inconsistent state.

That's defintively a problem for both approaches to update the epoch and
node information directly from the main thread as I/O can still be in
flight at this point.