[Sheepdog] [PATCH V2 2/2] sheep: teach sheepdog to better recovery the cluster

Liu Yuan namei.unix at gmail.com
Thu Sep 22 09:05:27 CEST 2011


On 09/22/2011 02:34 PM, Liu Yuan wrote:
> On 09/22/2011 02:01 PM, MORITA Kazutaka wrote:
>> At Wed, 21 Sep 2011 14:59:26 +0800,
>> Liu Yuan wrote:
>>> Kazutaka,
>>>       I guess this patch addresses inconsistency problem you mentioned.
>>> other comments are addressed too.
>> Thanks, this solves the inconsistency problem in a nice way!  I've
>> applied 3 patches in the v3 patchset.
>>
>
> Umm, actually, this just resolve some special case as you mentioned 
> (the first node we start up should be firstly down, because in its 
> epoch, there are full nodes information stored)
>
> Currently, we cannot recovery the cluster if we start up nodes other 
> than the firstly-down node *correctly* and in my option, we even 
> cannot handle this situation by software. Sheepdog itself cannot 
> determine who has the epoch with the full nodes information. however, 
> from outside, the admin can find it by hand. so to be afraid, sheepdog 
> will rely on the knowledge outside to handle some recovery cases.
>
For e.g. below we get the inconsistent epoch history, though the cluster 
gets up. as you mentioned, inconsistent epoch history will result in 
data loss.

root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./sheep/sheep 
/store/$i -z $i -p 700$i;sleep 1;done
root at taobao:/home/dev/sheepdog# collie/collie cluster format
root at taobao:/home/dev/sheepdog# for i in 0 1 2; do pkill -f "sheep 
/store/$i"; sleep 1; done
root at taobao:/home/dev/sheepdog# for i in 1 0 0 2; do ./sheep/sheep 
/store/$i -z $i -p 700$i;sleep 1;done
root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./collie/collie 
cluster info -p 700$i;done
Cluster status: running

Creation time        Epoch Nodes
2011-09-22 15:03:22      4 [192.168.0.1:7000, 192.168.0.1:7001, 
192.168.0.1:7002]
2011-09-22 15:03:22      3 [192.168.0.1:7000, 192.168.0.1:7001]
2011-09-22 15:03:22      2 [192.168.0.1:7001]
2011-09-22 15:03:22      1 [192.168.0.1:7000, 192.168.0.1:7001, 
192.168.0.1:7002]
Cluster status: running

Creation time        Epoch Nodes
2011-09-22 15:03:22      4 [192.168.0.1:7000, 192.168.0.1:7001, 
192.168.0.1:7002]
2011-09-22 15:03:22      3 [192.168.0.1:7000, 192.168.0.1:7001]
2011-09-22 15:03:22      2 [192.168.0.1:7001, 192.168.0.1:7002]
2011-09-22 15:03:22      1 [192.168.0.1:7000, 192.168.0.1:7001, 
192.168.0.1:7002]
Cluster status: running

Creation time        Epoch Nodes
2011-09-22 15:03:22      4 [192.168.0.1:7000, 192.168.0.1:7001, 
192.168.0.1:7002]
2011-09-22 15:03:22      3 [192.168.0.1:7000, 192.168.0.1:7001]
2011-09-22 15:03:22      2 [192.168.0.1:7001, 192.168.0.1:7002]
2011-09-22 15:03:22      1 [192.168.0.1:7000, 192.168.0.1:7001, 
192.168.0.1:7002]

Thanks,
Yuan



More information about the sheepdog mailing list