[sheepdog] reovery and consistency questions

Thu Feb 12 10:10:14 CET 2015

Hi Hitoshi,

Am 10.02.2015 um 08:31 schrieb Hitoshi Mitake:
> At the epoch 8 and 9, client cannot access to sheepdog because all
> members of latest healthy epoch (in this case, 6) aren't gathered yet.
> In such a case, you can see an output of cluster info command like
> below:
>
> $ dog cluster info
> (git)-[vid-overflow] 
> Cluster status: Waiting for other nodes to join cluster
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Cluster created at Tue Feb 10 16:20:10 2015
> ...
>
> So I/O to outdated objects are prevented in the above case. Access is
> allowed after gathering node 1, 2, 3, and 4 in your above example.
>
> After gathering enough members of the latest heathy epoch, sheeps run
> recovery process. Recovery process is simple:
> 1. exchange information of owning objects each other
> 2. list up objects which should belong to me
> 3. E <- the latest epoch
> 4. read an object from sheeps based on epoch E, the sheeps are
>    calculated based on consistent hashing
> 5. if no sheep processes have the object, E <- E - 1, go back to 3
> and repeat the above 3 - 5 until completing recovery of all
> objects. So you don't need to worry about access to outdated object :)
>
> I understand your concern well. This is really subtle and important
> point of distributed storage systems including sheepdog.
Thank you very much for your detailed answer. I think I also found the relevant code in the meanwhile, see
sheep/group.c#417 (enough_nodes_gathered).

Thanks again!

Corin