[sheepdog] Is it necessary for outstanding io block leave/join event?

Wed May 16 19:57:11 CEST 2012

At Wed, 16 May 2012 13:27:31 -0400,
Christoph Hellwig wrote:
> 
> On Thu, May 17, 2012 at 01:33:14AM +0900, MORITA Kazutaka wrote:
> > > I think the rational is to not change the cluster configuration while
> > > I/O is in progress.  With the vnode_info structure making the cluster
> > > state during and I/O operation explicit (together with hdr->epoch)
> > > I suspect this isn't needed any more, but I want to do a full blown
> > > audit of the I/O path first.
> > 
> > The reason is that the recovery algorithm assumes that all objects in
> > the older epoch are immutable, which means only the objects in the
> > current epoch are writable.  If outstanding I/Os update objects in the
> > previous epoch after they are recovered to the current epoch, their
> > replicas result in inconsistent state.
> 
> That's defintively a problem for both approaches to update the epoch and
> node information directly from the main thread as I/O can still be in
> flight at this point.

Hmm, in the previous implementation, Sheepdog flushed all in-flight
I/Os before processing join/leave events, and blocked any I/Os until
sheep update epoch and node information.  It seems that the current
code is broken...  IIUC, process_request_queue() must not be called
while event_running == 1.

Thanks,

Kazutaka