[sheepdog] [PATCH v3 04/11] sheep: check only joining nodes in sd_accept_handler
Liu Yuan
namei.unix at gmail.com
Sat Sep 21 18:12:21 CEST 2013
Only the joining node need to perform cluster_join_check. ANd remove the epoch
check code which only checks the latest epoch to avoid epoch inconsistency.
Rationale is included in the source file.
Signed-off-by: Liu Yuan <namei.unix at gmail.com>
---
sheep/group.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/sheep/group.c b/sheep/group.c
index e721a96..3f5aa85 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -846,12 +846,17 @@ static bool cluster_join_check(const struct cluster_info *cinfo)
if (!cluster_ctime_check(cinfo))
return false;
- if (cinfo->epoch == sys->cinfo.epoch &&
- memcmp(cinfo->nodes, sys->cinfo.nodes,
- sizeof(cinfo->nodes[0]) * cinfo->nr_nodes) != 0) {
- sd_alert("epoch log entries does not match");
- return false;
- }
+ /*
+ * Sheepdog's recovery code assumes every node have the same epoch
+ * history. But we don't check epoch history of joining node because:
+ * 1. inconsist epoch history only happens in the network partition case
+ * for the corosync driver, but corosync driver will panic for such
+ * case to prevent epoch inconsistency.
+ * 2. checking epoch history with joining node is too expensive and is
+ * unneeded for zookeeper driver.
+ *
+ * That said, we don't check epoch history at all.
+ */
return true;
}
@@ -863,7 +868,7 @@ main_fn void sd_accept_handler(const struct sd_node *joined,
int i;
const struct cluster_info *cinfo = opaque;
- if (!cluster_join_check(cinfo)) {
+ if (node_is_local(joined) && !cluster_join_check(cinfo)) {
sd_err("failed to join Sheepdog");
exit(1);
}
--
1.7.9.5
More information about the sheepdog
mailing list