[sheepdog] [PATCH v1 1/2] sheep/recovery: allocate old vinfo by using sys->cinfo

Robin Dong robin.k.dong at gmail.com
Wed Apr 23 08:29:49 CEST 2014


From: Robin Dong <sanbai at taobao.com>

Scene:
  1. start up 6 sheep daemons in one cluster
  2. write data into the cluster
  3. dog kill node 2 and wait for recovery complete
  4. kill all nodes
  5. start up 6 sheep daemons again and wait for recovery complete

then we read out the data and find out it is corrupted.

The reason is the present cold assume the last joined node is the killed-node
in previous epoch. This assumption is correct when adding a new node into cluster
but incorrect when starting up a cluster with failed node before.
To solve this problem, we allocate old vinfo by using nodes information stored in
epoch (which has been loaded into sys->cinfo) instead of which read out from
new cluster (zookeeper/corosync, etc.).

Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
Signed-off-by: Robin Dong <sanbai at taobao.com>
---
 sheep/group.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/sheep/group.c b/sheep/group.c
index b0873d0..2c027fc 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -555,22 +555,14 @@ int inc_and_log_epoch(void)
 				sys->cinfo.nr_nodes);
 }
 
-static struct vnode_info *alloc_old_vnode_info(const struct sd_node *joined,
-					       const struct rb_root *nroot)
+static struct vnode_info *alloc_old_vnode_info(void)
 {
 	struct rb_root old_root = RB_ROOT;
-	struct sd_node *n;
 	struct vnode_info *old;
 
-	/* exclude the newly added one */
-	rb_for_each_entry(n, nroot, rb) {
+	for (int i = 0; i < sys->cinfo.nr_nodes; i++) {
 		struct sd_node *new = xmalloc(sizeof(*new));
-
-		*new = *n;
-		if (node_eq(joined, new)) {
-			free(new);
-			continue;
-		}
+		*new = sys->cinfo.nodes[i];
 		if (rb_insert(&old_root, new, rb, node_cmp))
 			panic("node hash collision");
 	}
@@ -669,15 +661,16 @@ static void update_cluster_info(const struct cluster_info *cinfo,
 			set_cluster_config(&sys->cinfo);
 
 		if (nr_nodes != cinfo->nr_nodes) {
-			int ret = inc_and_log_epoch();
+			int ret;
+			if (old_vnode_info)
+				put_vnode_info(old_vnode_info);
+
+			old_vnode_info = alloc_old_vnode_info();
+			ret = inc_and_log_epoch();
 			if (ret != 0)
 				panic("cannot log current epoch %d",
 				      sys->cinfo.epoch);
 
-			if (!old_vnode_info)
-				old_vnode_info = alloc_old_vnode_info(joined,
-								      nroot);
-
 			start_recovery(main_thread_get(current_vnode_info),
 				       old_vnode_info, true);
 		} else if (!was_cluster_shutdowned()) {
-- 
1.7.12.4




More information about the sheepdog mailing list