[sheepdog] [PATCH V2 00/11] INTRODUCE

Tue Aug 21 04:46:19 CEST 2012

On 08/21/2012 02:29 AM, MORITA Kazutaka wrote:
> I think delaying recovery for a few seconds always is useful for many
> users.  Under heavy network load, sheep can wrongly detect node
> failure and node membership can change frequently.  Delaying recovery
> for a short time makes Sheepdog tolerant against such situation.

I think your example is very vague, what kind of driver you use? Sheep
itself won't sense membership and rely on cluster drivers to maintain
membership. Could you detail how it happen exactly in real case?

If you are talking about network partition problem, I don't think delay
recovery will help solve it. We have met network partition when we used
corosync driver, for zookeeper driver, we haven't met it yet. (I guess
we won't meet it with zookeeper as a central membership control).

Suppose we have 6 nodes in a cluster, A,B,C,D,E,F one copy with epoch =
1. For time t1, we get network partitioned, and three partitions show
up, c1(A,B,C), c2(D,E),c3(F). So epoch for this three partitions is
respectively epoch(c1=4, c2=5, c3=6) and all 3 partitions progress to
recover and get updates to its local object.

In your above example, suppose we might have these 3 partition
automatically merge into one partition, this means, after merging
1) epoch(c1=7, c2=9, c3=11)
2) no code to handle different version objects which all nodes think his
own local version is correct.

So I think we have to handle epoch mismatch and object multi-version
problems before evaluating delay recovery for network partition.

If you are not talking about network partition problem, I think we can
only meet stop/restart node case for manual maintenance, where I think
manual recovery could really be helpful.

Thanks,
Yuan