[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request
Liu Yuan
namei.unix at gmail.com
Sat Apr 28 12:15:27 CEST 2012
On 04/28/2012 05:58 PM, Shevek wrote:
> Wrong. Read is_master():
>
> 243 static int is_master(struct cpg_node *node)
> 244 {
> 245 int i;
> 246 struct cpg_node *n = node;
> 247 if (!n)
> 248 n = &this_node;
> 249 if (nr_cpg_nodes == 0)
> 250 /* this node should be the first cpg node */
> 251 return 0;
>
> ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set
> it to anything other than 0.
>
> 252
> 253 for (i = 0; i < SD_MAX_NODES; i++) {
> 254 if (!cpg_nodes[i].gone)
> 255 break;
> 256 }
> 257
> 258 if (cpg_node_equal(&cpg_nodes[i], n))
> 259 return i;
> 260 return -1;
> 261 }
>
> S.
I don't think so.
see below example (with only my previous c4e3559758b2e fix)
we have 2 nodes in the cluster (A, B), so
init: nr_cpg_nodes = 2, A is the master
C joins
1: A, B, C get a join_messgae(C)
for nr_cpg_nodes, A = 2, B = 2, C = 0
2: A crashed before sending response
3: B, C get a leve_message(A)
for nr_cpg_nodes, B = 2, C = 0
for is_master(), B = 1, C = 0
for join_finished, B = 1, C = 0
so now B is elected to be master and responsible to send_reponse()
4: everything goes okay.
What my patch didn't fix is for following scene:
we have 1 nodes in the cluster (A), so
init: nr_cpg_nodes = 1, A is the master
C joins
1: A, C get a join_messgae(C)
for nr_cpg_nodes, A = 1, C = 0
2: A crashed before sending response
3: B, C get a leve_message(A)
for nr_cpg_nodes, C = 0
for is_master(), C = 0
for join_finished C = 0
4: now C blocks for ever
By writing this example, I found that my next fix has something wrong
because the new joining node will be mistaken as master.
Thanks,
Yuan
More information about the sheepdog
mailing list