[sheepdog] read/write during recovery

Thu Jul 26 02:54:23 CEST 2012

At Wed, 25 Jul 2012 20:20:31 +0000,
Dietmar Maurer wrote:
> 
> > For example:
> > 
> >  1. There are two node, A and B.
> >  2. Node C joins Sheepdog, and journal data is written on node C until
> >     it finishes recovery.
> >  3. If node D joins Sheepdog before Node C finishes recovery, the node
> >     reads actual data from node A and B, and journal data from node C.
> >     At the same time, node C also needs to write journal data in local
> >     to handle write requests.
> >  4. If node E joins Sheepdog before node C and D finish recovery, node
> >     E needs to read journal data from node C and D.  Node E needs to
> >     know which journal is newer to apply journal in the correct order.
> 
> The real problem is that sheepdog change node mapping as soon
> as a new node joins.

Yes, so I suggested delaying recovering objects which are not accessed
by VMs to avoid redundant object move.  I thought that it looks much
simpler than changing recovery algorithm.  Are there any problems with
it?

> For me, it seems safer to keep the current
> mapping until all new nodes are in sync.
> 
> One can implement that by tracking the node status together with epoch.
> A node can be DOWN, UP (but not synced), and UP_SYNCED.
> 
> During writes, we consider 2 mappings. One only using UP_SYNCED nodes, the second
> consider UP and UP_SYNCED nodes. We write to all those nodes. For
> reads we only consider nodes in status UP.
> 
> That would avoid above error case?

Maybe it would work, but looks complicated to me.  Doesn't it need
many changes to the current codes?

Thanks,

Kazutaka