[sheepdog] read/write during recovery

Thu Jul 26 08:26:04 CEST 2012

> > > For example:
> > >
> > >  1. There are two node, A and B.
> > >  2. Node C joins Sheepdog, and journal data is written on node C until
> > >     it finishes recovery.
> > >  3. If node D joins Sheepdog before Node C finishes recovery, the node
> > >     reads actual data from node A and B, and journal data from node C.
> > >     At the same time, node C also needs to write journal data in local
> > >     to handle write requests.
> > >  4. If node E joins Sheepdog before node C and D finish recovery, node
> > >     E needs to read journal data from node C and D.  Node E needs to
> > >     know which journal is newer to apply journal in the correct order.
> >
> > The real problem is that sheepdog change node mapping as soon as a new
> > node joins.
> 
> Yes, so I suggested delaying recovering objects which are not accessed by
> VMs to avoid redundant object move.  I thought that it looks much simpler
> than changing recovery algorithm.  Are there any problems with it?

Not really. I just seems more natural to me, because you can then manually
trigger re-balance. You simply have better control.

> > For me, it seems safer to keep the current mapping until all new nodes
> > are in sync.
> >
> > One can implement that by tracking the node status together with epoch.
> > A node can be DOWN, UP (but not synced), and UP_SYNCED.
> >
> > During writes, we consider 2 mappings. One only using UP_SYNCED nodes,
> > the second consider UP and UP_SYNCED nodes. We write to all those
> > nodes. For reads we only consider nodes in status UP.
> >
> > That would avoid above error case?
> 
> Maybe it would work, but looks complicated to me.  Doesn't it need many
> changes to the current codes?

I am quite new to the project, so I can't really tell.

- Dietmar