On 09/22/2011 02:01 PM, MORITA Kazutaka wrote: > At Wed, 21 Sep 2011 14:59:26 +0800, > Liu Yuan wrote: >> Kazutaka, >> I guess this patch addresses inconsistency problem you mentioned. >> other comments are addressed too. > Thanks, this solves the inconsistency problem in a nice way! I've > applied 3 patches in the v3 patchset. > Umm, actually, this just resolve some special case as you mentioned (the first node we start up should be firstly down, because in its epoch, there are full nodes information stored) Currently, we cannot recovery the cluster if we start up nodes other than the firstly-down node *correctly* and in my option, we even cannot handle this situation by software. Sheepdog itself cannot determine who has the epoch with the full nodes information. however, from outside, the admin can find it by hand. so to be afraid, sheepdog will rely on the knowledge outside to handle some recovery cases. To conclude, with these patch applied, we can recovery the cluster 1) from the shutdown state(nodes with the same epoch) safely , without any start-up order 2) from the quit state (nodes with different epoch), with the constraint that we start up the node with the most epoch information(firstly down) first. > There is still a problem we need to solve. For example: > > $ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i; sleep 1; done > $ collie cluster format > $ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done > $ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done > $ for i in 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done > > After that, we get the consistent epoch like the follows. > > Creation time Epoch Nodes > 2011-09-22 14:18:33 6 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002] > 2011-09-22 14:18:33 5 [10.68.14.1:7000, 10.68.14.1:7001] > 2011-09-22 14:18:33 4 [10.68.14.1:7000] > 2011-09-22 14:18:33 3 [10.68.14.1:7002] > 2011-09-22 14:18:33 2 [10.68.14.1:7001, 10.68.14.1:7002] > 2011-09-22 14:18:33 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002] > > In this case, Sheepdog discards all the objects which were stored > before epoch 4. It is because there is no overlap between epoch 3 and > 4, and Sheepdog cannot handle this situation now. > > I think this can be fixed with a small change. I'll dig into this > issue. > > > Thanks, > > Kazutaka Hi Kazutaka, I also noticed the objects discarded by sheepdog after the similar situation, but I have no idea of it for now. would you please elaborate a bit more detailed reason for this specified situation? Thanks, Yuan |