[sheepdog] [PATCH v2 04/11] sheep: don't check nodes information for joined nodes

MORITA Kazutaka morita.kazutaka at gmail.com
Sat Sep 21 17:38:31 CEST 2013


At Sat, 21 Sep 2013 00:54:10 +0800,
Liu Yuan wrote:
> 
> On Sat, Sep 21, 2013 at 01:48:27AM +0900, MORITA Kazutaka wrote:
> > At Sat, 21 Sep 2013 00:39:31 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Sat, Sep 21, 2013 at 01:22:46AM +0900, MORITA Kazutaka wrote:
> > > > At Thu, 19 Sep 2013 17:08:08 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > On Thu, Sep 19, 2013 at 02:42:54AM +0900, MORITA Kazutaka wrote:
> > > > > > At Sat, 14 Sep 2013 18:34:24 +0800,
> > > > > > Liu Yuan wrote:
> > > > > > > 
> > > > > > > cluster_join_check is basically used to check newly joining node. But the old
> > > > > > > code also check the nodes states passed by cinfo with sys->cinfo. After we have
> > > > > > > struct rb_node rb in the sd_node, we'll never have this check passed.
> > > > > > > 
> > > > > > > Instead of doing the check with more complex code, this patch simply remove the
> > > > > > > check since nodes states in the joined nodes are always the same.
> > > > > > 
> > > > > > Is it true?  E.g. if network partition happens and two subclusters are
> > > > > > merged, the state of the joining node doesn't match.  The current code
> > > > > > can detect it, but this patch removes the check? 
> > > > > > 
> > > > > 
> > > > > why we don't allow nodes that are network partitioned to join back? Users asks
> > > > > to join the node, I think we should allow the node to join back, no?
> > > > 
> > > > We cannot allow rejoin after network partition happens.  E.g.
> > > > 
> > > >   1. Sheepdog is running with nodes {A, B, C, D, E} at epoch 1.
> > > >   2. The cluster is splitted into {A, B, C} and {D, E}.  Both are at
> > > >      epoch 2.
> > > >   3. The node A fails.  Then {B, C} is at epoch 3, and {D, E} is at
> > > >      epoch 2.
> > > >   4. If the two cluster is merged, their epoch numbers are not
> > > >      consistent.
> > > > 
> > > > I think we should at least keep the epoch number check.
> > > > 
> > > 
> > > I don't think we can do epoch check either. Suppose 
> > > 
> > > {A, B, C} at epoch 1
> > > C goes down, then {A, B} with epoch = 2
> > > when we add C back with epoch 1, C should joins the cluster.
> > 
> > The example is too simple.  When talking about network partition,
> > let's take into account the case where both subclusters have at least
> > two nodes.
> > 
> > {A, B, C, D} at epoch 1.
> > {A, B} and {C, D} at epoch 2.
> > {A, B} at epoch 2, and {C} at epoch 3 (the node D has gone).
> > 
> > Then, if {A, B} and {C} are merged, epoch numbers become inconsistent.
> 
> what did you mean by epoch inconsistent? For the current code, in your case

I meant the result of 'dog cluster info' could be different among the
nodes.

> C's cinfo will be reinited from A or B in the update_cluster_info in the
> accept handler. what exact the problem? If we simply check epoch equal, then we

update_cluster_info updates only the latest epoch and doesn't update
past epoch history.  We assume that every node has the same epoch
history especially in the recovery code.

> can't join node back if it fails, no?

No, but I think there is no way.  I don't think the current sheepdog
code can handle my example correctly, and, IMHO, it's safe to prevent
the invalid node from joining to Sheepdog.

Thanks,

Kazutaka



More information about the sheepdog mailing list