[Sheepdog] [PATCH V2 2/2] sheep: teach sheepdog to better recovery the cluster
Liu Yuan
namei.unix at gmail.com
Thu Sep 22 09:05:27 CEST 2011
On 09/22/2011 02:34 PM, Liu Yuan wrote:
> On 09/22/2011 02:01 PM, MORITA Kazutaka wrote:
>> At Wed, 21 Sep 2011 14:59:26 +0800,
>> Liu Yuan wrote:
>>> Kazutaka,
>>> I guess this patch addresses inconsistency problem you mentioned.
>>> other comments are addressed too.
>> Thanks, this solves the inconsistency problem in a nice way! I've
>> applied 3 patches in the v3 patchset.
>>
>
> Umm, actually, this just resolve some special case as you mentioned
> (the first node we start up should be firstly down, because in its
> epoch, there are full nodes information stored)
>
> Currently, we cannot recovery the cluster if we start up nodes other
> than the firstly-down node *correctly* and in my option, we even
> cannot handle this situation by software. Sheepdog itself cannot
> determine who has the epoch with the full nodes information. however,
> from outside, the admin can find it by hand. so to be afraid, sheepdog
> will rely on the knowledge outside to handle some recovery cases.
>
For e.g. below we get the inconsistent epoch history, though the cluster
gets up. as you mentioned, inconsistent epoch history will result in
data loss.
root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./sheep/sheep
/store/$i -z $i -p 700$i;sleep 1;done
root at taobao:/home/dev/sheepdog# collie/collie cluster format
root at taobao:/home/dev/sheepdog# for i in 0 1 2; do pkill -f "sheep
/store/$i"; sleep 1; done
root at taobao:/home/dev/sheepdog# for i in 1 0 0 2; do ./sheep/sheep
/store/$i -z $i -p 700$i;sleep 1;done
root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./collie/collie
cluster info -p 700$i;done
Cluster status: running
Creation time Epoch Nodes
2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001,
192.168.0.1:7002]
2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001]
2011-09-22 15:03:22 2 [192.168.0.1:7001]
2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001,
192.168.0.1:7002]
Cluster status: running
Creation time Epoch Nodes
2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001,
192.168.0.1:7002]
2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001]
2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002]
2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001,
192.168.0.1:7002]
Cluster status: running
Creation time Epoch Nodes
2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001,
192.168.0.1:7002]
2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001]
2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002]
2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001,
192.168.0.1:7002]
Thanks,
Yuan
More information about the sheepdog
mailing list