[Sheepdog] [PATCH V2 2/2] sheep: teach sheepdog to better recovery the cluster
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Sat Sep 24 07:45:08 CEST 2011
At Sat, 24 Sep 2011 12:20:28 +0800,
Liu Yuan wrote:
>
> On 09/23/2011 07:49 PM, MORITA Kazutaka wrote:
> > At Thu, 22 Sep 2011 15:05:27 +0800,
> > Liu Yuan wrote:
> >> On 09/22/2011 02:34 PM, Liu Yuan wrote:
> >>> On 09/22/2011 02:01 PM, MORITA Kazutaka wrote:
> >>>> At Wed, 21 Sep 2011 14:59:26 +0800,
> >>>> Liu Yuan wrote:
> >>>>> Kazutaka,
> >>>>> I guess this patch addresses inconsistency problem you mentioned.
> >>>>> other comments are addressed too.
> >>>> Thanks, this solves the inconsistency problem in a nice way! I've
> >>>> applied 3 patches in the v3 patchset.
> >>>>
> >>> Umm, actually, this just resolve some special case as you mentioned
> >>> (the first node we start up should be firstly down, because in its
> >>> epoch, there are full nodes information stored)
> >>>
> >>> Currently, we cannot recovery the cluster if we start up nodes other
> >>> than the firstly-down node *correctly* and in my option, we even
> >>> cannot handle this situation by software. Sheepdog itself cannot
> >>> determine who has the epoch with the full nodes information. however,
> >>> from outside, the admin can find it by hand. so to be afraid, sheepdog
> >>> will rely on the knowledge outside to handle some recovery cases.
> >>>
> >> For e.g. below we get the inconsistent epoch history, though the cluster
> >> gets up. as you mentioned, inconsistent epoch history will result in
> >> data loss.
> >>
> >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./sheep/sheep
> >> /store/$i -z $i -p 700$i;sleep 1;done
> >> root at taobao:/home/dev/sheepdog# collie/collie cluster format
> >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do pkill -f "sheep
> >> /store/$i"; sleep 1; done
> >> root at taobao:/home/dev/sheepdog# for i in 1 0 0 2; do ./sheep/sheep
> >> /store/$i -z $i -p 700$i;sleep 1;done
> >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./collie/collie
> >> cluster info -p 700$i;done
> >> Cluster status: running
> >>
> >> Creation time Epoch Nodes
> >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001,
> >> 192.168.0.1:7002]
> >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001]
> >> 2011-09-22 15:03:22 2 [192.168.0.1:7001]
> >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001,
> >> 192.168.0.1:7002]
> >> Cluster status: running
> >>
> >> Creation time Epoch Nodes
> >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001,
> >> 192.168.0.1:7002]
> >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001]
> >> 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002]
> >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001,
> >> 192.168.0.1:7002]
> >> Cluster status: running
> >>
> >> Creation time Epoch Nodes
> >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001,
> >> 192.168.0.1:7002]
> >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001]
> >> 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002]
> >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001,
> >> 192.168.0.1:7002]
> > Hi Yuan,
> >
> > How about the below patch? I guess this would solve all the problem
> > we've discussed.
> Hi Kazutaka,
> Your patch fixes the problem. But I think it is a bit too complex.
> I came up with an much simpler patch, which just add two checks in
> add_node_to_leave_list(). And I also further the leave node idea for the
> crash cluster recovery. It seems that leave nodes concept copes with
> crash cluster as well. How do you think of it?
>
> I have sent the patch set in a new thread.
Thanks, I like your simpler approach. But how to deal with the case
that the master node's epoch doesn't contain the node which has the
latest epoch? I think this is the most complicated situation.
For example:
for i in 0 1; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
./collie/collie cluster format
for i in 2 3 4; do
pkill -f "sheep /store/$((i - 2))"
./sheep/sheep /store/$i -z $i -p 700$i
sleep 1
done
for i in 3 4; do pkill -f "sheep /store/$i"; sleep 1; done
for i in 0 1 2 3 4; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
for i in 1 2 3 4; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
for i in 0 1 2 3 4; do ./collie/collie cluster info -p 700$i; done
My patch handles this, but your one doesn't. Is it possible to handle
this with a simple change? Or, perhaps, don't we need to consider
this case?
Thanks,
Kazutaka
More information about the sheepdog
mailing list