> I like the idea but I think it makes the recovery more complex. > > For example: > > 1. There are two node, A and B. > 2. Node C joins Sheepdog, and journal data is written on node C until > it finishes recovery. > 3. If node D joins Sheepdog before Node C finishes recovery, the node > reads actual data from node A and B, and journal data from node C. > At the same time, node C also needs to write journal data in local > to handle write requests. > 4. If node E joins Sheepdog before node C and D finish recovery, node > E needs to read journal data from node C and D. Node E needs to > know which journal is newer to apply journal in the correct order. > > The situation becomes more complex if we have more nodes. Do you have > any ideas to handle node failure with journal data simply? Well, I never considered such error scenarios. I thought we can simply reject reads during recovery, but that is not the case. The journal does not contain 'all' object data (only the pieces written), so you can never do a successful read. You need to wait until data is recovered. - Dietmar |