[sheepdog] read/write during recovery

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Jul 26 02:54:23 CEST 2012


At Wed, 25 Jul 2012 20:20:31 +0000,
Dietmar Maurer wrote:
> 
> > For example:
> > 
> >  1. There are two node, A and B.
> >  2. Node C joins Sheepdog, and journal data is written on node C until
> >     it finishes recovery.
> >  3. If node D joins Sheepdog before Node C finishes recovery, the node
> >     reads actual data from node A and B, and journal data from node C.
> >     At the same time, node C also needs to write journal data in local
> >     to handle write requests.
> >  4. If node E joins Sheepdog before node C and D finish recovery, node
> >     E needs to read journal data from node C and D.  Node E needs to
> >     know which journal is newer to apply journal in the correct order.
> 
> The real problem is that sheepdog change node mapping as soon
> as a new node joins.

Yes, so I suggested delaying recovering objects which are not accessed
by VMs to avoid redundant object move.  I thought that it looks much
simpler than changing recovery algorithm.  Are there any problems with
it?

> For me, it seems safer to keep the current
> mapping until all new nodes are in sync.
> 
> One can implement that by tracking the node status together with epoch.
> A node can be DOWN, UP (but not synced), and UP_SYNCED.
> 
> During writes, we consider 2 mappings. One only using UP_SYNCED nodes, the second
> consider UP and UP_SYNCED nodes. We write to all those nodes. For
> reads we only consider nodes in status UP.
> 
> That would avoid above error case?

Maybe it would work, but looks complicated to me.  Doesn't it need
many changes to the current codes?

Thanks,

Kazutaka



More information about the sheepdog mailing list