[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request
Liu Yuan
namei.unix at gmail.com
Sat Apr 28 12:58:11 CEST 2012
On 04/28/2012 06:15 PM, Liu Yuan wrote:
> I don't think so.
>
> see below example (with only my previous c4e3559758b2e fix)
>
> we have 2 nodes in the cluster (A, B), so
>
> init: nr_cpg_nodes = 2, A is the master
> C joins
> 1: A, B, C get a join_messgae(C)
> for nr_cpg_nodes, A = 2, B = 2, C = 0
> 2: A crashed before sending response
> 3: B, C get a leve_message(A)
> for nr_cpg_nodes, B = 2, C = 0
> for is_master(), B = 1, C = 0
> for join_finished, B = 1, C = 0
> so now B is elected to be master and responsible to send_reponse()
> 4: everything goes okay.
Finally I got sometime to do more test, I have run below test to prove
it correct:
first patch the master:
diff --git a/sheep/cluster/corosync.c b/sheep/cluster/corosync.c
index 4a588e9..e960088 100644
--- a/sheep/cluster/corosync.c
+++ b/sheep/cluster/corosync.c
@@ -280,6 +280,7 @@ static int __corosync_dispatch_one(struct
corosync_event *cevent)
enum cluster_join_result res;
struct sd_node entries[SD_MAX_NODES];
int idx;
+ static int i;
switch (cevent->type) {
case COROSYNC_EVENT_TYPE_JOIN:
@@ -300,6 +301,12 @@ static int __corosync_dispatch_one(struct
corosync_event *cevent)
if (res == CJ_RES_MASTER_TRANSFER)
nr_cpg_nodes = 0;
+ i++;
+ if (i == 3) {
+ dprintf("%d\n", i);
+ panic("Okay, I am forced out\n");
+ }
+
send_message(COROSYNC_MSG_TYPE_JOIN_RESPONSE, res,
&cevent->sender, cpg_nodes,
nr_cpg_nodes,
cevent->msg, cevent->msg_len);
then run the following script:
for i in 0 1; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z
$i -p 700$i;sleep 1;done
echo simulate master is down before sending response
for i in 2; do sheep/sheep -a -d /home/tailai.ly/sheepdog/store/$i -z $i
-p 700$i;sleep 1;done
then we can see from the log that node 1 and 2 join the cluster without
problem.
Thanks,
Yuan
More information about the sheepdog
mailing list