[sheepdog] [PATCH v2 04/11] sheep: don't check nodes information for joined nodes

MORITA Kazutaka morita.kazutaka at gmail.com
Fri Sep 20 18:22:46 CEST 2013


At Thu, 19 Sep 2013 17:08:08 +0800,
Liu Yuan wrote:
> 
> On Thu, Sep 19, 2013 at 02:42:54AM +0900, MORITA Kazutaka wrote:
> > At Sat, 14 Sep 2013 18:34:24 +0800,
> > Liu Yuan wrote:
> > > 
> > > cluster_join_check is basically used to check newly joining node. But the old
> > > code also check the nodes states passed by cinfo with sys->cinfo. After we have
> > > struct rb_node rb in the sd_node, we'll never have this check passed.
> > > 
> > > Instead of doing the check with more complex code, this patch simply remove the
> > > check since nodes states in the joined nodes are always the same.
> > 
> > Is it true?  E.g. if network partition happens and two subclusters are
> > merged, the state of the joining node doesn't match.  The current code
> > can detect it, but this patch removes the check? 
> > 
> 
> why we don't allow nodes that are network partitioned to join back? Users asks
> to join the node, I think we should allow the node to join back, no?

We cannot allow rejoin after network partition happens.  E.g.

  1. Sheepdog is running with nodes {A, B, C, D, E} at epoch 1.
  2. The cluster is splitted into {A, B, C} and {D, E}.  Both are at
     epoch 2.
  3. The node A fails.  Then {B, C} is at epoch 3, and {D, E} is at
     epoch 2.
  4. If the two cluster is merged, their epoch numbers are not
     consistent.

I think we should at least keep the epoch number check.

Thanks,

Kazutaka



More information about the sheepdog mailing list