[Sheepdog] [PATCH V2 2/2] sheep: teach sheepdog to better recovery the cluster

Thu Sep 22 08:34:17 CEST 2011

On 09/22/2011 02:01 PM, MORITA Kazutaka wrote:
> At Wed, 21 Sep 2011 14:59:26 +0800,
> Liu Yuan wrote:
>> Kazutaka,
>>       I guess this patch addresses inconsistency problem you mentioned.
>> other comments are addressed too.
> Thanks, this solves the inconsistency problem in a nice way!  I've
> applied 3 patches in the v3 patchset.
>

Umm, actually, this just resolve some special case as you mentioned (the 
first node we start up should be firstly down, because in its epoch, 
there are full nodes information stored)

Currently, we cannot recovery the cluster if we start up nodes other 
than the firstly-down node *correctly* and in my option, we even cannot 
handle this situation by software. Sheepdog itself cannot determine who 
has the epoch with the full nodes information. however, from outside, 
the admin can find it by hand. so to be afraid, sheepdog will rely on 
the knowledge outside to handle some recovery cases.

To conclude, with these patch applied, we can recovery the cluster
1) from the shutdown state(nodes with the same epoch) safely , without 
any start-up order
2) from the quit state (nodes with different epoch), with the constraint 
that we start up the node with the most epoch information(firstly down) 
first.

> There is still a problem we need to solve.  For example:
>
>    $ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i; sleep 1; done
>    $ collie cluster format
>    $ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
>    $ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
>    $ for i in 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
>
> After that, we get the consistent epoch like the follows.
>
>    Creation time        Epoch Nodes
>    2011-09-22 14:18:33      6 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
>    2011-09-22 14:18:33      5 [10.68.14.1:7000, 10.68.14.1:7001]
>    2011-09-22 14:18:33      4 [10.68.14.1:7000]
>    2011-09-22 14:18:33      3 [10.68.14.1:7002]
>    2011-09-22 14:18:33      2 [10.68.14.1:7001, 10.68.14.1:7002]
>    2011-09-22 14:18:33      1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
>
> In this case, Sheepdog discards all the objects which were stored
> before epoch 4.  It is because there is no overlap between epoch 3 and
> 4, and Sheepdog cannot handle this situation now.
>
> I think this can be fixed with a small change.  I'll dig into this
> issue.
>
>
> Thanks,
>
> Kazutaka
Hi Kazutaka,
     I also noticed the objects discarded by sheepdog after the similar 
situation, but I have no idea of it for now. would you please elaborate 
a bit more detailed reason for this specified situation?

Thanks,
Yuan