[sheepdog] [PATCH v2 04/11] sheep: don't check nodes information for joined nodes

Liu Yuan namei.unix at gmail.com
Fri Sep 20 18:54:10 CEST 2013


On Sat, Sep 21, 2013 at 01:48:27AM +0900, MORITA Kazutaka wrote:
> At Sat, 21 Sep 2013 00:39:31 +0800,
> Liu Yuan wrote:
> > 
> > On Sat, Sep 21, 2013 at 01:22:46AM +0900, MORITA Kazutaka wrote:
> > > At Thu, 19 Sep 2013 17:08:08 +0800,
> > > Liu Yuan wrote:
> > > > 
> > > > On Thu, Sep 19, 2013 at 02:42:54AM +0900, MORITA Kazutaka wrote:
> > > > > At Sat, 14 Sep 2013 18:34:24 +0800,
> > > > > Liu Yuan wrote:
> > > > > > 
> > > > > > cluster_join_check is basically used to check newly joining node. But the old
> > > > > > code also check the nodes states passed by cinfo with sys->cinfo. After we have
> > > > > > struct rb_node rb in the sd_node, we'll never have this check passed.
> > > > > > 
> > > > > > Instead of doing the check with more complex code, this patch simply remove the
> > > > > > check since nodes states in the joined nodes are always the same.
> > > > > 
> > > > > Is it true?  E.g. if network partition happens and two subclusters are
> > > > > merged, the state of the joining node doesn't match.  The current code
> > > > > can detect it, but this patch removes the check? 
> > > > > 
> > > > 
> > > > why we don't allow nodes that are network partitioned to join back? Users asks
> > > > to join the node, I think we should allow the node to join back, no?
> > > 
> > > We cannot allow rejoin after network partition happens.  E.g.
> > > 
> > >   1. Sheepdog is running with nodes {A, B, C, D, E} at epoch 1.
> > >   2. The cluster is splitted into {A, B, C} and {D, E}.  Both are at
> > >      epoch 2.
> > >   3. The node A fails.  Then {B, C} is at epoch 3, and {D, E} is at
> > >      epoch 2.
> > >   4. If the two cluster is merged, their epoch numbers are not
> > >      consistent.
> > > 
> > > I think we should at least keep the epoch number check.
> > > 
> > 
> > I don't think we can do epoch check either. Suppose 
> > 
> > {A, B, C} at epoch 1
> > C goes down, then {A, B} with epoch = 2
> > when we add C back with epoch 1, C should joins the cluster.
> 
> The example is too simple.  When talking about network partition,
> let's take into account the case where both subclusters have at least
> two nodes.
> 
> {A, B, C, D} at epoch 1.
> {A, B} and {C, D} at epoch 2.
> {A, B} at epoch 2, and {C} at epoch 3 (the node D has gone).
> 
> Then, if {A, B} and {C} are merged, epoch numbers become inconsistent.

what did you mean by epoch inconsistent? For the current code, in your case
C's cinfo will be reinited from A or B in the update_cluster_info in the
accept handler. what exact the problem? If we simply check epoch equal, then we
can't join node back if it fails, no?

Thanks
Yuan



More information about the sheepdog mailing list