[sheepdog] read/write during recovery

Wed Jul 25 10:01:56 CEST 2012

> > >> My naïve patch looks like this (can be optimized further):
> >
> > >IIUC, your patch does not handle write requests because write
> > >journaling is not implemented yet, yes?  I think it is not easy to
> > >implement journaling across nodes.  Do you have any ideas to
> > >implement it simply?
> >
> > The idea is to simply discard those write request. We can do that,
> > because there is at least one node which has data locally, and that
> > node applies all writes (we sync data from that node later).
> 
> How do you handle the following case?
> 
>  1. There are two node A and B (redundancy level is 2), and each node
>     has one object.
>  2. Node C joins Sheepdog, and new placement of the object becomes
>     node B and C.
>  3. A VM writes data to the object, and node B completes the request
>     but node C rejects it since recovery is not started.
>  4. Node B crashes before node C gets the updated data from node B,
>     and then the written data will be lost even though only one node
>     fails.  In addtion, the VM can reads the old object after the
>     failure, which breaks the block device semantics.

Sure, If all nodes with actual data crash you have a problem. So sheepdog
tries to store data ASAP to make that unlikely? I guess I got it now ;-)

But using a journal for writes (during recovery) is still a good idea, because

- no delays on write when in recovery mode
- use less memory

what do you think?