On 09/23/2011 07:49 PM, MORITA Kazutaka wrote: > At Thu, 22 Sep 2011 15:05:27 +0800, > Liu Yuan wrote: >> On 09/22/2011 02:34 PM, Liu Yuan wrote: >>> On 09/22/2011 02:01 PM, MORITA Kazutaka wrote: >>>> At Wed, 21 Sep 2011 14:59:26 +0800, >>>> Liu Yuan wrote: >>>>> Kazutaka, >>>>> I guess this patch addresses inconsistency problem you mentioned. >>>>> other comments are addressed too. >>>> Thanks, this solves the inconsistency problem in a nice way! I've >>>> applied 3 patches in the v3 patchset. >>>> >>> Umm, actually, this just resolve some special case as you mentioned >>> (the first node we start up should be firstly down, because in its >>> epoch, there are full nodes information stored) >>> >>> Currently, we cannot recovery the cluster if we start up nodes other >>> than the firstly-down node *correctly* and in my option, we even >>> cannot handle this situation by software. Sheepdog itself cannot >>> determine who has the epoch with the full nodes information. however, >>> from outside, the admin can find it by hand. so to be afraid, sheepdog >>> will rely on the knowledge outside to handle some recovery cases. >>> >> For e.g. below we get the inconsistent epoch history, though the cluster >> gets up. as you mentioned, inconsistent epoch history will result in >> data loss. >> >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./sheep/sheep >> /store/$i -z $i -p 700$i;sleep 1;done >> root at taobao:/home/dev/sheepdog# collie/collie cluster format >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do pkill -f "sheep >> /store/$i"; sleep 1; done >> root at taobao:/home/dev/sheepdog# for i in 1 0 0 2; do ./sheep/sheep >> /store/$i -z $i -p 700$i;sleep 1;done >> root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./collie/collie >> cluster info -p 700$i;done >> Cluster status: running >> >> Creation time Epoch Nodes >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, >> 192.168.0.1:7002] >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] >> 2011-09-22 15:03:22 2 [192.168.0.1:7001] >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, >> 192.168.0.1:7002] >> Cluster status: running >> >> Creation time Epoch Nodes >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, >> 192.168.0.1:7002] >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] >> 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002] >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, >> 192.168.0.1:7002] >> Cluster status: running >> >> Creation time Epoch Nodes >> 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, >> 192.168.0.1:7002] >> 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] >> 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002] >> 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, >> 192.168.0.1:7002] > Hi Yuan, > > How about the below patch? I guess this would solve all the problem > we've discussed. Hi Kazutaka, Your patch fixes the problem. But I think it is a bit too complex. I came up with an much simpler patch, which just add two checks in add_node_to_leave_list(). And I also further the leave node idea for the crash cluster recovery. It seems that leave nodes concept copes with crash cluster as well. How do you think of it? I have sent the patch set in a new thread. Thanks, Yuan |