On 04/28/2012 06:15 PM, Liu Yuan wrote: > I don't think so. > > see below example (with only my previous c4e3559758b2e fix) > > we have 2 nodes in the cluster (A, B), so > > init: nr_cpg_nodes = 2, A is the master > C joins > 1: A, B, C get a join_messgae(C) > for nr_cpg_nodes, A = 2, B = 2, C = 0 > 2: A crashed before sending response > 3: B, C get a leve_message(A) > for nr_cpg_nodes, B = 2, C = 0 > for is_master(), B = 1, C = 0 > for join_finished, B = 1, C = 0 > so now B is elected to be master and responsible to send_reponse() > 4: everything goes okay. Finally I got sometime to do more test, I have run below test to prove it correct: first patch the master: diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c index 4a588e9..e960088 100644 --- a/sheep/cluster/corosync.c +++ b/sheep/cluster/corosync.c @@ -280,6 +280,7 @@ static int __corosync_dispatch_one(struct corosync_event *cevent) enum cluster_join_result res; struct sd_node entries[SD_MAX_NODES]; int idx; + static int i; switch (cevent->type) { case COROSYNC_EVENT_TYPE_JOIN: @@ -300,6 +301,12 @@ static int __corosync_dispatch_one(struct corosync_event *cevent) if (res == CJ_RES_MASTER_TRANSFER) nr_cpg_nodes = 0; + i++; + if (i == 3) { + dprintf("%d\n", i); + panic("Okay, I am forced out\n"); + } + send_message(COROSYNC_MSG_TYPE_JOIN_RESPONSE, res, &cevent->sender, cpg_nodes, nr_cpg_nodes, cevent->msg, cevent->msg_len); then run the following script: for i in 0 1; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done echo simulate master is down before sending response for i in 2; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i;sleep 1;done then we can see from the log that node 1 and 2 join the cluster without problem. Thanks, Yuan |