On Thu, May 17, 2012 at 01:33:14AM +0900, MORITA Kazutaka wrote: > > I think the rational is to not change the cluster configuration while > > I/O is in progress. With the vnode_info structure making the cluster > > state during and I/O operation explicit (together with hdr->epoch) > > I suspect this isn't needed any more, but I want to do a full blown > > audit of the I/O path first. > > The reason is that the recovery algorithm assumes that all objects in > the older epoch are immutable, which means only the objects in the > current epoch are writable. If outstanding I/Os update objects in the > previous epoch after they are recovered to the current epoch, their > replicas result in inconsistent state. That's defintively a problem for both approaches to update the epoch and node information directly from the main thread as I/O can still be in flight at this point. |