[Sheepdog] [PATCH V2 2/2] sheep: teach sheepdog to better recovery the cluster

Thu Sep 22 08:01:47 CEST 2011

At Wed, 21 Sep 2011 14:59:26 +0800,
Liu Yuan wrote:
> 
> Kazutaka,
>      I guess this patch addresses inconsistency problem you mentioned. 
> other comments are addressed too.

Thanks, this solves the inconsistency problem in a nice way!  I've
applied 3 patches in the v3 patchset.

There is still a problem we need to solve.  For example:

  $ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i; sleep 1; done
  $ collie cluster format
  $ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
  $ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
  $ for i in 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done

After that, we get the consistent epoch like the follows.

  Creation time        Epoch Nodes
  2011-09-22 14:18:33      6 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
  2011-09-22 14:18:33      5 [10.68.14.1:7000, 10.68.14.1:7001]
  2011-09-22 14:18:33      4 [10.68.14.1:7000]
  2011-09-22 14:18:33      3 [10.68.14.1:7002]
  2011-09-22 14:18:33      2 [10.68.14.1:7001, 10.68.14.1:7002]
  2011-09-22 14:18:33      1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]

In this case, Sheepdog discards all the objects which were stored
before epoch 4.  It is because there is no overlap between epoch 3 and
4, and Sheepdog cannot handle this situation now.

I think this can be fixed with a small change.  I'll dig into this
issue.

Thanks,

Kazutaka