[Sheepdog] [PATCH 4/4] sheep: tame sheep to recover the crash cluster

Liu Yuan namei.unix at gmail.com
Sat Sep 24 06:14:55 CEST 2011


From: Liu Yuan <tailai.ly at taobao.com>

Currently, we have to start up the frist failed node or last failed one to
recover the crash cluster (nodes with different epoch histories). This patch
simply remove this disgusting constraint.

To this point, we can precisely define 'leave node'.

Leave Node:
	Crash cluster: For the master node (first started node), leave nodes are nodes
	that are contained in the master epoch and are supposed to leave during the recovery
	stage. That is, leave nodes are nodes that enable the master to get the knowledge
	of when to recover.

	Shutdown cluster: Account for unhealthy nodes that are supposed to leave during the
	recovery stage. This enables nodes alive in the cluster to get the knowledge of when
	to recover.

With this patch, there is no start-up order imposed for the crash cluster to recover. We can
do this because the epoch on each node has the node with the highest epoch number contained.

The method that tries to test this idea:

$ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
$ collie/collie cluster format
$ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
$ for i in 1 0 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
$ for i in 0 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
$ for i in 0 1 2; do ./collie/collie cluster info -p 700$i; done
Cluster status: running

Creation time        Epoch Nodes
2011-09-24 11:45:52      6 [192.168.0.4:7000, 192.168.0.4:7001, 192.168.0.4:7002]
2011-09-24 11:45:52      5 [192.168.0.4:7000, 192.168.0.4:7001]
2011-09-24 11:45:52      4 [192.168.0.4:7001]
2011-09-24 11:45:52      3 [192.168.0.4:7002]
2011-09-24 11:45:52      2 [192.168.0.4:7001, 192.168.0.4:7002]
2011-09-24 11:45:52      1 [192.168.0.4:7000, 192.168.0.4:7001, 192.168.0.4:7002]
Cluster status: running

Creation time        Epoch Nodes
2011-09-24 11:45:52      6 [192.168.0.4:7000, 192.168.0.4:7001, 192.168.0.4:7002]
2011-09-24 11:45:52      5 [192.168.0.4:7000, 192.168.0.4:7001]
2011-09-24 11:45:52      4 [192.168.0.4:7001]
2011-09-24 11:45:52      3 [192.168.0.4:7002]
2011-09-24 11:45:52      2 [192.168.0.4:7001, 192.168.0.4:7002]
2011-09-24 11:45:52      1 [192.168.0.4:7000, 192.168.0.4:7001, 192.168.0.4:7002]
Cluster status: running

Creation time        Epoch Nodes
2011-09-24 11:45:52      6 [192.168.0.4:7000, 192.168.0.4:7001, 192.168.0.4:7002]
2011-09-24 11:45:52      5 [192.168.0.4:7000, 192.168.0.4:7001]
2011-09-24 11:45:52      4 [192.168.0.4:7001]
2011-09-24 11:45:52      3 [192.168.0.4:7002]
2011-09-24 11:45:52      2 [192.168.0.4:7001, 192.168.0.4:7002]
2011-09-24 11:45:52      1 [192.168.0.4:7000, 192.168.0.4:7001, 192.168.0.4:7002]

Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
 sheep/group.c |   25 ++++++++++++++++++++++---
 1 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/sheep/group.c b/sheep/group.c
index 812f6a0..53846cb 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -435,10 +435,27 @@ static struct sheepdog_node_list_entry *find_entry_list(struct sheepdog_node_lis
 	return NULL;
 
 }
+
+static struct sheepdog_node_list_entry *find_entry_epoch(struct sheepdog_node_list_entry *entry,
+							 int epoch)
+{
+	struct sheepdog_node_list_entry nodes[SD_MAX_NODES];
+	int nr, i;
+
+	nr = epoch_log_read(epoch, (char *)nodes, sizeof(nodes));
+	nr /= sizeof(nodes[0]);
+
+	for (i = 0; i < nr; i++)
+		if (node_cmp(&nodes[i], entry) == 0)
+			return entry;
+
+	return NULL;
+}
+
 static int add_node_to_leave_list(struct message_header *msg)
 {
 	int ret = SD_RES_SUCCESS;
-	int nr, i;
+	int nr, i, le = get_latest_epoch();
 	LIST_HEAD(tmp_list);
 	struct node *n, *t;
 	struct join_message *jm;
@@ -450,7 +467,8 @@ static int add_node_to_leave_list(struct message_header *msg)
 			goto err;
 		}
 
-		if (find_entry_list(&msg->from, &sys->leave_list)) {
+		if (find_entry_list(&msg->from, &sys->leave_list)
+		    || !find_entry_epoch(&msg->from, le)) {
 			free(n);
 			goto ret;
 		}
@@ -471,7 +489,8 @@ static int add_node_to_leave_list(struct message_header *msg)
 				goto free;
 			}
 
-			if (find_entry_list(&jm->leave_nodes[i].ent, &sys->leave_list)) {
+			if (find_entry_list(&jm->leave_nodes[i].ent, &sys->leave_list)
+			    || !find_entry_epoch(&jm->leave_nodes[i].ent, le)) {
 				free(n);
 				continue;
 			}
-- 
1.7.6.1




More information about the sheepdog mailing list