[sheepdog] [PATCH v2 1/2] sheep/recovery: allocate old vinfo by using sys->cinfo

Robin Dong robin.k.dong at gmail.com
Fri Apr 25 09:00:25 CEST 2014


From: Robin Dong <sanbai at taobao.com>

Scene:
  1. start up 4 sheep daemons in one cluster
  2. write data into the cluster
  3. "dog kill node 2" and wait for recovery complete
     current epoch status ('O' for update epoch, 'X' for stale epoch):

        node: 1  2  3  4
       epoch: O  X  O  O

  4. kill all nodes
  5. start up all 4 sheep daemons again and wait for recovery complete

then we read out the data and find out it is corrupted.

The reason is the code use information from cluster as old vinfo, but the
information from cluster present 4 nodes, not previous 3 nodes status.
We don't need to worry about "node 2" who's epoch is stale, it will find
out oid correctly in recovery process because it use current_vnode_info as
'cur_info' argument in start_recovery().

To solve this problem, we allocate old vinfo by using nodes information stored
in epoch (which has been loaded into sys->cinfo) instead of which read out from
new cluster (zookeeper/corosync, etc.).

Cc: Liu Yuan <namei.unix at gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
Signed-off-by: Robin Dong <sanbai at taobao.com>
---
 sheep/group.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/sheep/group.c b/sheep/group.c
index b0873d0..2c027fc 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -555,22 +555,14 @@ int inc_and_log_epoch(void)
 				sys->cinfo.nr_nodes);
 }
 
-static struct vnode_info *alloc_old_vnode_info(const struct sd_node *joined,
-					       const struct rb_root *nroot)
+static struct vnode_info *alloc_old_vnode_info(void)
 {
 	struct rb_root old_root = RB_ROOT;
-	struct sd_node *n;
 	struct vnode_info *old;
 
-	/* exclude the newly added one */
-	rb_for_each_entry(n, nroot, rb) {
+	for (int i = 0; i < sys->cinfo.nr_nodes; i++) {
 		struct sd_node *new = xmalloc(sizeof(*new));
-
-		*new = *n;
-		if (node_eq(joined, new)) {
-			free(new);
-			continue;
-		}
+		*new = sys->cinfo.nodes[i];
 		if (rb_insert(&old_root, new, rb, node_cmp))
 			panic("node hash collision");
 	}
@@ -669,15 +661,16 @@ static void update_cluster_info(const struct cluster_info *cinfo,
 			set_cluster_config(&sys->cinfo);
 
 		if (nr_nodes != cinfo->nr_nodes) {
-			int ret = inc_and_log_epoch();
+			int ret;
+			if (old_vnode_info)
+				put_vnode_info(old_vnode_info);
+
+			old_vnode_info = alloc_old_vnode_info();
+			ret = inc_and_log_epoch();
 			if (ret != 0)
 				panic("cannot log current epoch %d",
 				      sys->cinfo.epoch);
 
-			if (!old_vnode_info)
-				old_vnode_info = alloc_old_vnode_info(joined,
-								      nroot);
-
 			start_recovery(main_thread_get(current_vnode_info),
 				       old_vnode_info, true);
 		} else if (!was_cluster_shutdowned()) {
-- 
1.7.12.4




More information about the sheepdog mailing list