[Sheepdog] [PATCH v2] sheep: fix a network partition issue

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Oct 31 11:00:04 CET 2011


At Tue, 25 Oct 2011 15:06:05 +0800,
Liu Yuan wrote:
> 
> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
> 
> > From: Yibin Shen <zituan at taobao.com>
> > 
> > In some situation, sheep may disconnected from corosync instantaneously,
> > at the same time, both sheep and corosync will keep running but
> > none of them exit, then the disconnected sheep may receive a confchg
> > message from corosync which notify this sheep has left.
> > that will lead to a network partition, this patch fix it.
> > 
> > Signed-off-by: Yibin Shen <zituan at taobao.com>
> > ---
> >  sheep/group.c |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/sheep/group.c b/sheep/group.c
> > index e22dabc..ab5a9f0 100644
> > --- a/sheep/group.c
> > +++ b/sheep/group.c
> > @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
> >  	struct work_leave *w = NULL;
> >  	int i, size;
> >  
> > +	if (node_cmp(left, &sys->this_node) == 0)
> > +		panic("BUG: this node can't be on the left list\n");
> > +
> 
> 
> Hmm, the panic output looks confusing. how about "Network Patition Bug:
> I should have exited.\n"? since the output will be seen by
> administrators, not only programmer.

Applied after modifying output text, thanks!

Kazutaka



More information about the sheepdog mailing list