At Sat, 24 Sep 2011 12:20:28 +0800, Liu Yuan wrote: > > On 09/23/2011 07:49 PM, MORITA Kazutaka wrote: > > At Thu, 22 Sep 2011 15:05:27 +0800, > > Liu Yuan wrote: > >> On 09/22/2011 02:34 PM, Liu Yuan wrote: > >>> On 09/22/2011 02:01 PM, MORITA Kazutaka wrote: > >>>> At Wed, 21 Sep 2011 14:59:26 +0800, > >>>> Liu Yuan wrote: > >>>>> Kazutaka, > >>>>> I guess this patch addresses inconsistency problem you mentioned. > >>>>> other comments are addressed too. > >>>> Thanks, this solves the inconsistency problem in a nice way! I've > >>>> applied 3 patches in the v3 patchset. > >>>> > >>> Umm, actually, this just resolve some special case as you mentioned > >>> (the first node we start up should be firstly down, because in its > >>> epoch, there are full nodes information stored) > >>> > >>> Currently, we cannot recovery the cluster if we start up nodes other > >>> than the firstly-down node *correctly* and in my option, we even > >>> cannot handle this situation by software. Sheepdog itself cannot > >>> determine who has the epoch with the full nodes information. however, > >>> from outside, the admin can find it by hand. so to be afraid, sheepdog > >>> will rely on the knowledge outside to handle some recovery cases. > >>> > >> For e.g. below we get the inconsistent epoch history, though the cluster > >> gets up. as you mentioned, inconsistent epoch history will result in > >> data loss. > >> > >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./sheep/sheep > >> /store/$i -z $i -p 700$i;sleep 1;done > >> root at taobao:/home/dev/sheepdog# collie/collie cluster format > >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do pkill -f "sheep > >> /store/$i"; sleep 1; done > >> root at taobao:/home/dev/sheepdog# for i in 1 0 0 2; do ./sheep/sheep > >> /store/$i -z $i -p 700$i;sleep 1;done > >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./collie/collie > >> cluster info -p 700$i;done > >> Cluster status: running > >> > >> Creation time Epoch Nodes > >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, > >> 192.168.0.1:7002] > >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] > >> 2011-09-22 15:03:22 2 [192.168.0.1:7001] > >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, > >> 192.168.0.1:7002] > >> Cluster status: running > >> > >> Creation time Epoch Nodes > >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, > >> 192.168.0.1:7002] > >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] > >> 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002] > >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, > >> 192.168.0.1:7002] > >> Cluster status: running > >> > >> Creation time Epoch Nodes > >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, > >> 192.168.0.1:7002] > >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] > >> 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002] > >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, > >> 192.168.0.1:7002] > > Hi Yuan, > > > > How about the below patch? I guess this would solve all the problem > > we've discussed. > Hi Kazutaka, > Your patch fixes the problem. But I think it is a bit too complex. > I came up with an much simpler patch, which just add two checks in > add_node_to_leave_list(). And I also further the leave node idea for the > crash cluster recovery. It seems that leave nodes concept copes with > crash cluster as well. How do you think of it? > > I have sent the patch set in a new thread. Thanks, I like your simpler approach. But how to deal with the case that the master node's epoch doesn't contain the node which has the latest epoch? I think this is the most complicated situation. For example: for i in 0 1; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done ./collie/collie cluster format for i in 2 3 4; do pkill -f "sheep /store/$((i - 2))" ./sheep/sheep /store/$i -z $i -p 700$i sleep 1 done for i in 3 4; do pkill -f "sheep /store/$i"; sleep 1; done for i in 0 1 2 3 4; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done for i in 1 2 3 4; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done for i in 0 1 2 3 4; do ./collie/collie cluster info -p 700$i; done My patch handles this, but your one doesn't. Is it possible to handle this with a simple change? Or, perhaps, don't we need to consider this case? Thanks, Kazutaka |