[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Mon Oct 31 12:10:22 CET 2011

At Mon, 31 Oct 2011 18:15:06 +0800,
Liu Yuan wrote:
> 
> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote:
> 
> > At Tue, 25 Oct 2011 15:06:05 +0800,
> > Liu Yuan wrote:
> >>
> >> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
> >>
> >>> From: Yibin Shen <zituan at taobao.com>
> >>>
> >>> In some situation, sheep may disconnected from corosync instantaneously,
> >>> at the same time, both sheep and corosync will keep running but
> >>> none of them exit, then the disconnected sheep may receive a confchg
> >>> message from corosync which notify this sheep has left.
> >>> that will lead to a network partition, this patch fix it.
> >>>
> >>> Signed-off-by: Yibin Shen <zituan at taobao.com>
> >>> ---
> >>>  sheep/group.c |    3 +++
> >>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/sheep/group.c b/sheep/group.c
> >>> index e22dabc..ab5a9f0 100644
> >>> --- a/sheep/group.c
> >>> +++ b/sheep/group.c
> >>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
> >>>  	struct work_leave *w = NULL;
> >>>  	int i, size;
> >>>  
> >>> +	if (node_cmp(left, &sys->this_node) == 0)
> >>> +		panic("BUG: this node can't be on the left list\n");
> >>> +
> >>
> >>
> >> Hmm, the panic output looks confusing. how about "Network Patition Bug:
> >> I should have exited.\n"? since the output will be seen by
> >> administrators, not only programmer.
> > 
> > Applied after modifying output text, thanks!
> > 
> > Kazutaka
> 
> 
> Kazutaka,
> 	Maybe we should not panic out when it becomes a single node cluster.
> The node will change into HALT state which doesn't any harm to its data.

It is much better.  Currently, Sheepdog kills a minority cluster in
__sd_leave() when network partition occurs because it is the simplest
solution to keep data consistency.

But this looks a different issue from this patch.  Does corosync
include local node in the left list when network partition occurs?  If
so, we should handle it in the corosync cluster driver because it
looks a corosync specific issue to me.

Thanks,

Kazutaka

> 
> 	This is introduced by a corosync bug which is fixed by Yunkai in latest
> corosync. During patch fixing, Yunkai found that corosync will likely
> re-join the old configuration soon after a short break-out with other
> corosync nodes.
> 
> for example, (a,b,c) is a corosync ring.
> for some time n(c) breaks out, and becomes a single ring itself.later,
> n(c) rejoins
> 
> (a,b,c) -> (a,b), (c) -> (a,b,c)
>          |             |
>      confchg1      confchg2
> 
> so the question is , do we have to panic out n(c) at confchg1 in this
> case? since n(c) does no harm to data, after confchg2, I think, IIUC,
> n(c) will see the view as (a,b,c). no?
> 
> Thanks,
> Yuan
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog