[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Mon Oct 31 12:49:17 CET 2011

At Mon, 31 Oct 2011 19:37:51 +0800,
Liu Yuan wrote:
> 
> On 10/31/2011 07:10 PM, MORITA Kazutaka wrote:
> 
> > At Mon, 31 Oct 2011 18:15:06 +0800,
> > Liu Yuan wrote:
> >>
> >> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote:
> >>
> >>> At Tue, 25 Oct 2011 15:06:05 +0800,
> >>> Liu Yuan wrote:
> >>>>
> >>>> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
> >>>>
> >>>>> From: Yibin Shen <zituan at taobao.com>
> >>>>>
> >>>>> In some situation, sheep may disconnected from corosync instantaneously,
> >>>>> at the same time, both sheep and corosync will keep running but
> >>>>> none of them exit, then the disconnected sheep may receive a confchg
> >>>>> message from corosync which notify this sheep has left.
> >>>>> that will lead to a network partition, this patch fix it.
> >>>>>
> >>>>> Signed-off-by: Yibin Shen <zituan at taobao.com>
> >>>>> ---
> >>>>>  sheep/group.c |    3 +++
> >>>>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/sheep/group.c b/sheep/group.c
> >>>>> index e22dabc..ab5a9f0 100644
> >>>>> --- a/sheep/group.c
> >>>>> +++ b/sheep/group.c
> >>>>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
> >>>>>  	struct work_leave *w = NULL;
> >>>>>  	int i, size;
> >>>>>  
> >>>>> +	if (node_cmp(left, &sys->this_node) == 0)
> >>>>> +		panic("BUG: this node can't be on the left list\n");
> >>>>> +
> >>>>
> >>>>
> >>>> Hmm, the panic output looks confusing. how about "Network Patition Bug:
> >>>> I should have exited.\n"? since the output will be seen by
> >>>> administrators, not only programmer.
> >>>
> >>> Applied after modifying output text, thanks!
> >>>
> >>> Kazutaka
> >>
> >>
> >> Kazutaka,
> >> 	Maybe we should not panic out when it becomes a single node cluster.
> >> The node will change into HALT state which doesn't any harm to its data.
> > 
> > It is much better.  Currently, Sheepdog kills a minority cluster in
> > __sd_leave() when network partition occurs because it is the simplest
> > solution to keep data consistency.
> > 
> > But this looks a different issue from this patch.  Does corosync
> > include local node in the left list when network partition occurs?  If
> > so, we should handle it in the corosync cluster driver because it
> > looks a corosync specific issue to me.
> > 
> 
> I am not sure, but if corosync include local node in the left list, it
> should be a bug in corosync.
> 
> let's assume (a,b,c) three nodes.
> I am suspecting that that left message is for n(b,c), but after n(a)
> rejoins, for whatever reason, the message is being broadcasting, and
> n(a) just gets it wrongly.

IIUC, n(a) should receive the left massage of n(b,c).

> 
> Yunkai's going to patch corosync for related bugs during corosync node
> leave & rejoin the old ring. With those noted bug fixes, let's see if
> this kind of problem ( node receive a msg that itself is left) would
> still exist.

Okay.

Thanks,

Kazutaka