From: Liu Yuan <tailai.ly at taobao.com> Consider two nodes cluster, A is the master and B is going to join. A: I'm the master B: send_join_msg A: crached before sending join_response B: blocked for ever The fix is let B go as far as possible, where B can find himself become the master. 1) path the code to simulate the problem: @@ -280,6 +280,7 @@ static int __corosync_dispatch_one(struct corosync_event *cevent) enum cluster_join_result res; struct sd_node entries[SD_MAX_NODES]; int idx; + static int i; switch (cevent->type) { case COROSYNC_EVENT_TYPE_JOIN: @@ -300,6 +301,12 @@ static int __corosync_dispatch_one(struct corosync_event *cevent) if (res == CJ_RES_MASTER_TRANSFER) nr_cpg_nodes = 0; + i++; + if (i == 2) { + dprintf("%d\n", i); + panic("Okay, I am forced out\n"); + } + send_message(COROSYNC_MSG_TYPE_JOIN_RESPONSE, res, &cevent->sender, cpg_nodes, nr_cpg_nodes, cevent->msg, cevent->msg_len); 2) run the following script: for i in 0 1 2; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done Now we can handle this very case nicely, that mastership is transferred from 0 to 1, and from 1 to 2 dispite of master is crashed before sending response. But, we can't handle below script yet: for i in 0 ; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done for i in 1 2; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;done #no sleep Question is: Do we really need to handle above artificial scenario? Signed-off-by: Liu Yuan <tailai.ly at taobao.com> --- sheep/cluster/corosync.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c index 4a588e9..66d1e03 100644 --- a/sheep/cluster/corosync.c +++ b/sheep/cluster/corosync.c @@ -592,6 +592,14 @@ static void cdrv_cpg_confchg(cpg_handle_t handle, cevent->type = COROSYNC_EVENT_TYPE_LEAVE; cevent->sender = left_sheep[i]; + if (member_list_entries == left_list_entries - joined_list_entries) { + /* I am the last one in the cluster */ + struct corosync_event *event; + event = find_block_event(COROSYNC_EVENT_TYPE_JOIN, &this_node); + if (event) + event->first_node = 1; + } + list_add_tail(&cevent->list, &corosync_event_list); } -- 1.7.8.2 |