[Sheepdog] [PATCH V2 2/2] sheep: teach sheepdog to better recovery the cluster
namei.unix at gmail.com
Thu Sep 22 08:34:17 CEST 2011
On 09/22/2011 02:01 PM, MORITA Kazutaka wrote:
> At Wed, 21 Sep 2011 14:59:26 +0800,
> Liu Yuan wrote:
>> I guess this patch addresses inconsistency problem you mentioned.
>> other comments are addressed too.
> Thanks, this solves the inconsistency problem in a nice way! I've
> applied 3 patches in the v3 patchset.
Umm, actually, this just resolve some special case as you mentioned (the
first node we start up should be firstly down, because in its epoch,
there are full nodes information stored)
Currently, we cannot recovery the cluster if we start up nodes other
than the firstly-down node *correctly* and in my option, we even cannot
handle this situation by software. Sheepdog itself cannot determine who
has the epoch with the full nodes information. however, from outside,
the admin can find it by hand. so to be afraid, sheepdog will rely on
the knowledge outside to handle some recovery cases.
To conclude, with these patch applied, we can recovery the cluster
1) from the shutdown state(nodes with the same epoch) safely , without
any start-up order
2) from the quit state (nodes with different epoch), with the constraint
that we start up the node with the most epoch information(firstly down)
> There is still a problem we need to solve. For example:
> $ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i; sleep 1; done
> $ collie cluster format
> $ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
> $ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
> $ for i in 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
> After that, we get the consistent epoch like the follows.
> Creation time Epoch Nodes
> 2011-09-22 14:18:33 6 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> 2011-09-22 14:18:33 5 [10.68.14.1:7000, 10.68.14.1:7001]
> 2011-09-22 14:18:33 4 [10.68.14.1:7000]
> 2011-09-22 14:18:33 3 [10.68.14.1:7002]
> 2011-09-22 14:18:33 2 [10.68.14.1:7001, 10.68.14.1:7002]
> 2011-09-22 14:18:33 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> In this case, Sheepdog discards all the objects which were stored
> before epoch 4. It is because there is no overlap between epoch 3 and
> 4, and Sheepdog cannot handle this situation now.
> I think this can be fixed with a small change. I'll dig into this
I also noticed the objects discarded by sheepdog after the similar
situation, but I have no idea of it for now. would you please elaborate
a bit more detailed reason for this specified situation?
More information about the sheepdog