> > I like the idea but I think it makes the recovery more complex. > > > > For example: > > > > 1. There are two node, A and B. > > 2. Node C joins Sheepdog, and journal data is written on node C until > > it finishes recovery. > > 3. If node D joins Sheepdog before Node C finishes recovery, the node > > reads actual data from node A and B, and journal data from node C. > > At the same time, node C also needs to write journal data in local > > to handle write requests. > > 4. If node E joins Sheepdog before node C and D finish recovery, node > > E needs to read journal data from node C and D. Node E needs to > > know which journal is newer to apply journal in the correct order. > > > > The situation becomes more complex if we have more nodes. Do you have > > any ideas to handle node failure with journal data simply? > > Well, I never considered such error scenarios. I thought we can simply reject > reads during recovery, but that is not the case. > > The journal does not contain 'all' object data (only the pieces written), so you > can never do a successful read. You need to wait until data is recovered. But wait. Maybe we can force the gateway node to write the whole object if needed? - Dietmar |