> For example: > > 1. There are two node, A and B. > 2. Node C joins Sheepdog, and journal data is written on node C until > it finishes recovery. > 3. If node D joins Sheepdog before Node C finishes recovery, the node > reads actual data from node A and B, and journal data from node C. > At the same time, node C also needs to write journal data in local > to handle write requests. > 4. If node E joins Sheepdog before node C and D finish recovery, node > E needs to read journal data from node C and D. Node E needs to > know which journal is newer to apply journal in the correct order. The real problem is that sheepdog change node mapping as soon as a new node joins. For me, it seems safer to keep the current mapping until all new nodes are in sync. One can implement that by tracking the node status together with epoch. A node can be DOWN, UP (but not synced), and UP_SYNCED. During writes, we consider 2 mappings. One only using UP_SYNCED nodes, the second consider UP and UP_SYNCED nodes. We write to all those nodes. For reads we only consider nodes in status UP. That would avoid above error case? |