> > >> My naïve patch looks like this (can be optimized further): > > > > >IIUC, your patch does not handle write requests because write > > >journaling is not implemented yet, yes? I think it is not easy to > > >implement journaling across nodes. Do you have any ideas to > > >implement it simply? > > > > The idea is to simply discard those write request. We can do that, > > because there is at least one node which has data locally, and that > > node applies all writes (we sync data from that node later). > > How do you handle the following case? > > 1. There are two node A and B (redundancy level is 2), and each node > has one object. > 2. Node C joins Sheepdog, and new placement of the object becomes > node B and C. > 3. A VM writes data to the object, and node B completes the request > but node C rejects it since recovery is not started. > 4. Node B crashes before node C gets the updated data from node B, > and then the written data will be lost even though only one node > fails. In addtion, the VM can reads the old object after the > failure, which breaks the block device semantics. Sure, If all nodes with actual data crash you have a problem. So sheepdog tries to store data ASAP to make that unlikely? I guess I got it now ;-) But using a journal for writes (during recovery) is still a good idea, because - no delays on write when in recovery mode - use less memory what do you think? |