[sheepdog] [PATCH v3 04/11] sheep: check only joining nodes in sd_accept_handler

Liu Yuan namei.unix at gmail.com
Sat Sep 21 18:12:21 CEST 2013


Only the joining node need to perform cluster_join_check. ANd remove the epoch
check code which only checks the latest epoch to avoid epoch inconsistency.
Rationale is included in the source file.

Signed-off-by: Liu Yuan <namei.unix at gmail.com>
---
 sheep/group.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/sheep/group.c b/sheep/group.c
index e721a96..3f5aa85 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -846,12 +846,17 @@ static bool cluster_join_check(const struct cluster_info *cinfo)
 	if (!cluster_ctime_check(cinfo))
 		return false;
 
-	if (cinfo->epoch == sys->cinfo.epoch &&
-	    memcmp(cinfo->nodes, sys->cinfo.nodes,
-		   sizeof(cinfo->nodes[0]) * cinfo->nr_nodes) != 0) {
-		sd_alert("epoch log entries does not match");
-		return false;
-	}
+	/*
+	 * Sheepdog's recovery code assumes every node have the same epoch
+	 * history. But we don't check epoch history of joining node because:
+	 * 1. inconsist epoch history only happens in the network partition case
+	 *    for the corosync driver, but corosync driver will panic for such
+	 *    case to prevent epoch inconsistency.
+	 * 2. checking epoch history with joining node is too expensive and is
+	 *    unneeded for zookeeper driver.
+	 *
+	 * That said, we don't check epoch history at all.
+	 */
 
 	return true;
 }
@@ -863,7 +868,7 @@ main_fn void sd_accept_handler(const struct sd_node *joined,
 	int i;
 	const struct cluster_info *cinfo = opaque;
 
-	if (!cluster_join_check(cinfo)) {
+	if (node_is_local(joined) && !cluster_join_check(cinfo)) {
 		sd_err("failed to join Sheepdog");
 		exit(1);
 	}
-- 
1.7.9.5




More information about the sheepdog mailing list