[sheepdog] read/write during recovery
Dietmar Maurer
dietmar at proxmox.com
Thu Jul 26 08:26:04 CEST 2012
> > > For example:
> > >
> > > 1. There are two node, A and B.
> > > 2. Node C joins Sheepdog, and journal data is written on node C until
> > > it finishes recovery.
> > > 3. If node D joins Sheepdog before Node C finishes recovery, the node
> > > reads actual data from node A and B, and journal data from node C.
> > > At the same time, node C also needs to write journal data in local
> > > to handle write requests.
> > > 4. If node E joins Sheepdog before node C and D finish recovery, node
> > > E needs to read journal data from node C and D. Node E needs to
> > > know which journal is newer to apply journal in the correct order.
> >
> > The real problem is that sheepdog change node mapping as soon as a new
> > node joins.
>
> Yes, so I suggested delaying recovering objects which are not accessed by
> VMs to avoid redundant object move. I thought that it looks much simpler
> than changing recovery algorithm. Are there any problems with it?
Not really. I just seems more natural to me, because you can then manually
trigger re-balance. You simply have better control.
> > For me, it seems safer to keep the current mapping until all new nodes
> > are in sync.
> >
> > One can implement that by tracking the node status together with epoch.
> > A node can be DOWN, UP (but not synced), and UP_SYNCED.
> >
> > During writes, we consider 2 mappings. One only using UP_SYNCED nodes,
> > the second consider UP and UP_SYNCED nodes. We write to all those
> > nodes. For reads we only consider nodes in status UP.
> >
> > That would avoid above error case?
>
> Maybe it would work, but looks complicated to me. Doesn't it need many
> changes to the current codes?
I am quite new to the project, so I can't really tell.
- Dietmar
More information about the sheepdog
mailing list