On 09/22/2011 02:34 PM, Liu Yuan wrote: > On 09/22/2011 02:01 PM, MORITA Kazutaka wrote: >> At Wed, 21 Sep 2011 14:59:26 +0800, >> Liu Yuan wrote: >>> Kazutaka, >>> I guess this patch addresses inconsistency problem you mentioned. >>> other comments are addressed too. >> Thanks, this solves the inconsistency problem in a nice way! I've >> applied 3 patches in the v3 patchset. >> > > Umm, actually, this just resolve some special case as you mentioned > (the first node we start up should be firstly down, because in its > epoch, there are full nodes information stored) > > Currently, we cannot recovery the cluster if we start up nodes other > than the firstly-down node *correctly* and in my option, we even > cannot handle this situation by software. Sheepdog itself cannot > determine who has the epoch with the full nodes information. however, > from outside, the admin can find it by hand. so to be afraid, sheepdog > will rely on the knowledge outside to handle some recovery cases. > For e.g. below we get the inconsistent epoch history, though the cluster gets up. as you mentioned, inconsistent epoch history will result in data loss. root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i;sleep 1;done root at taobao:/home/dev/sheepdog# collie/collie cluster format root at taobao:/home/dev/sheepdog# for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done root at taobao:/home/dev/sheepdog# for i in 1 0 0 2; do ./sheep/sheep /store/$i -z $i -p 700$i;sleep 1;done root at taobao:/home/dev/sheepdog# for i in 0 1 2; do ./collie/collie cluster info -p 700$i;done Cluster status: running Creation time Epoch Nodes 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002] 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] 2011-09-22 15:03:22 2 [192.168.0.1:7001] 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002] Cluster status: running Creation time Epoch Nodes 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002] 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002] 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002] Cluster status: running Creation time Epoch Nodes 2011-09-22 15:03:22 4 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002] 2011-09-22 15:03:22 3 [192.168.0.1:7000, 192.168.0.1:7001] 2011-09-22 15:03:22 2 [192.168.0.1:7001, 192.168.0.1:7002] 2011-09-22 15:03:22 1 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002] Thanks, Yuan |