[sheepdog] [PATCH] sheep: fix incorrect status transition during joining cluster

Sat Jun 8 19:18:28 CEST 2013

Current sheep set jm->cluster_status SD_STATUS_OK if
1. itself is a first node to join the cluster, and
2. a number of nodes of latest epoch is 1

This is an invalid behavior. For example, the behavior allows such a
situation:
1. create a cluster with 2 nodes
2. format the cluster with --copies 2
3. kill single sheep
4. shutdown the cluster
5. launch single sheep
6. the status of the cluster is SD_STATUS_OK

This patch solves the problem. The second condition is now checked
with have_enough_zones().

Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
---
 sheep/group.c |   10 ++--------
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/sheep/group.c b/sheep/group.c
index f74ef10..48040bf 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -994,12 +994,10 @@ enum cluster_join_result sd_check_join_cb(const struct sd_node *joining,
 	}
 
 	if (node_is_local(joining)) {
-		struct sd_node entries[SD_MAX_NODES];
-		int nr_entries;
 		uint32_t epoch;
 
 		/*
-		 * If I'm the first sheep joins in corosync, I
+		 * If I'm the first sheep joins the cluster, I
 		 * becomes the master without sending JOIN.
 		 */
 
@@ -1017,14 +1015,10 @@ enum cluster_join_result sd_check_join_cb(const struct sd_node *joining,
 			return CJ_RES_FAIL;
 		}
 
-		nr_entries = epoch_log_read(epoch, entries, sizeof(entries));
-		if (nr_entries == -1)
-			return CJ_RES_FAIL;
-
 		sys->epoch = epoch;
 		jm->ctime = get_cluster_ctime();
 
-		if (nr_entries == 1)
+		if (have_enough_zones())
 			jm->cluster_status = SD_STATUS_OK;
 		return CJ_RES_SUCCESS;
 	}
-- 
1.7.5.1