[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request

Liu Yuan namei.unix at gmail.com
Fri Apr 27 16:43:36 CEST 2012


On 04/27/2012 08:15 AM, Shevek wrote:

> +	// Exactly one non-master member has seen join events for all other
> +	// members, because events are ordered.
> +	for (i = 0; i < member_list_entries; i++) {
> +		struct cpg_node member = {
> +			.nodeid = member_list[i].nodeid,
> +			.pid = member_list[i].pid,
> +			};
> +		cevent = find_block_event(COROSYNC_EVENT_TYPE_JOIN, &member);
> +		if (cevent == NULL) {
> +			dprintf("Not promoting because member is not in our event list.");
> +			goto nopromote;
> +		}
> +	}
> +
> +	list_for_each_entry(cevent, &corosync_event_list, list) {
> +		dprintf("Setting first_node on event %p.", cevent);
> +		cevent->first_node = 1;
> +	}
> +nopromote:
> +


I think the fix is the way too hacky. The fix here abuse the 'first
node' denotation which is to mean, IIUC, 'first node in the cluster' or
'first group of nodes in the cluster'.

I am not quit sure about this, but the fix really confuses me, it makes
the join phase elusive. Kazum, how do you think of it?

Thanks,
Yuan



More information about the sheepdog mailing list