[Sheepdog] [PATCH 3/3] cluster, corosync: do mastership transfer when master is down in join phase

Liu Yuan namei.unix at gmail.com
Wed Nov 30 13:00:00 CET 2011


From: Liu Yuan <tailai.ly at taobao.com>

If master is down before sending response in join phase, we have to
revoke its mastership to avoid cluster hanging.

Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
 sheep/cluster/corosync.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c
index 6f1eda4..01b3673 100644
--- a/sheep/cluster/corosync.c
+++ b/sheep/cluster/corosync.c
@@ -541,6 +541,7 @@ static void cdrv_cpg_confchg(cpg_handle_t handle,
 
 	/* dispatch leave_handler */
 	for (i = 0; i < left_list_entries; i++) {
+		int master;
 		cevent = find_block_event(COROSYNC_EVENT_TYPE_JOIN,
 					  left_sheep + i);
 		if (cevent) {
@@ -564,6 +565,13 @@ static void cdrv_cpg_confchg(cpg_handle_t handle,
 		if (!cevent)
 			panic("failed to allocate memory\n");
 
+		master = is_master(&left_sheep[i]);
+		if (master >= 0)
+		/* Master is down before new nodes finish joining.
+		 * We have to revoke its mastership to avoid cluster hanging
+		 */
+			cpg_nodes[master].gone = 1;
+
 		cevent->type = COROSYNC_EVENT_TYPE_LEAVE;
 		cevent->sender = left_sheep[i];
 
-- 
1.7.8.rc3




More information about the sheepdog mailing list