[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request

Sat Apr 28 11:22:22 CEST 2012

On Sat, 2012-04-28 at 11:02 +0800, Liu Yuan wrote:
> On 04/27/2012 10:43 PM, Liu Yuan wrote:
> 
> > On 04/27/2012 08:15 AM, Shevek wrote:
> > 
> >> +	// Exactly one non-master member has seen join events for all other
> >> +	// members, because events are ordered.
> >> +	for (i = 0; i < member_list_entries; i++) {
> >> +		struct cpg_node member = {
> >> +			.nodeid = member_list[i].nodeid,
> >> +			.pid = member_list[i].pid,
> >> +			};
> >> +		cevent = find_block_event(COROSYNC_EVENT_TYPE_JOIN, &member);
> >> +		if (cevent == NULL) {
> >> +			dprintf("Not promoting because member is not in our event list.");
> >> +			goto nopromote;
> >> +		}
> >> +	}
> >> +
> >> +	list_for_each_entry(cevent, &corosync_event_list, list) {
> >> +		dprintf("Setting first_node on event %p.", cevent);
> >> +		cevent->first_node = 1;
> >> +	}
> >> +nopromote:
> >> +
> > 
> > 
> > I think the fix is the way too hacky. The fix here abuse the 'first
> > node' denotation which is to mean, IIUC, 'first node in the cluster' or
> > 'first group of nodes in the cluster'.
> > 
> > I am not quit sure about this, but the fix really confuses me, it makes
> > the join phase elusive. Kazum, how do you think of it?
> > 
> > Thanks,
> > Yuan
> 
> 
> Hi, Shevek,
> 	I have drafted a patch against the issue, could you please try the
> following patch?
> 
> Thanks,
> Yuan
> 
> From 3ace66e31e3a5dd6886b340e54a37f31594d25dc Mon Sep 17 00:00:00 2001
> From: Liu Yuan <tailai.ly at taobao.com>
> Date: Sat, 28 Apr 2012 10:41:57 +0800
> Subject: [PATCH] corosync: fix cluster hang
> 
> Consider two nodes cluster, A is the master and B is going
> to join.
>                 A: I'm the master
> B: send_join_msg
>                 A: crached before sending join_response
> B: blocked for ever
> 
> The fix is let B go as far as possible, where B can find himself
> become the master.
> 
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
>  sheep/cluster/corosync.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c
> index 4a588e9..7f52385 100644
> --- a/sheep/cluster/corosync.c
> +++ b/sheep/cluster/corosync.c
> @@ -393,7 +393,7 @@ static void __corosync_dispatch(void)
>                         }
>                 }
> 
> -               if (join_finished)
> +               if (join_finished || is_master(&cevent->sender))

I believe this patch does NOT work:

1) is_master returns -1 for not-master, 0 means the given node is
master, so this is apparently a misuse of is_master.

However, even worse:

2) If we never got a join response, then line 388:
    388 if (!cevent->blocked && cpg_node_equal(&cevent->sender,
&this_node)) {
never triggered, therefore we have never set join_finished, and
nr_cpg_nodes is always 0 (look for every place where it got set), so
is_master() is always 0.

So, by inspection, this patch can never work, for both reasons.

Our previous patch does work. At the point when we set first_node = 1
(which was a hack in the first place, but not our hack), we are actually
the first (still living) node in the corosync group. So it isn't abuse
of the field. It is using it for its original purpose, just with a
different condition.

S.

>                         done = __corosync_dispatch_one(cevent);
>                 else
>