[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request

Liu Yuan namei.unix at gmail.com
Sat Apr 28 12:15:27 CEST 2012


On 04/28/2012 05:58 PM, Shevek wrote:

> Wrong. Read is_master():
> 
>     243 static int is_master(struct cpg_node *node)
>     244 {
>     245     int i;
>     246     struct cpg_node *n = node;
>     247     if (!n)
>     248         n = &this_node;
>     249     if (nr_cpg_nodes == 0)
>     250         /* this node should be the first cpg node */
>     251         return 0;
> 
> ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set
> it to anything other than 0.
> 
>     252 
>     253     for (i = 0; i < SD_MAX_NODES; i++) {
>     254         if (!cpg_nodes[i].gone)
>     255             break;
>     256     }
>     257 
>     258     if (cpg_node_equal(&cpg_nodes[i], n))
>     259         return i;
>     260     return -1;
>     261 }
> 
> S.


I don't think so.

see below example (with only my previous c4e3559758b2e fix)

we have 2 nodes in the cluster (A, B), so

init: nr_cpg_nodes = 2, A is the master
	C joins
1: A, B, C get a join_messgae(C)
  for nr_cpg_nodes, A = 2, B = 2, C = 0
2: A crashed before sending response
3: B, C get a leve_message(A)
  for nr_cpg_nodes, B = 2, C = 0
  for is_master(), B = 1, C = 0
  for join_finished, B = 1, C = 0
  so now B is elected to be master and responsible to send_reponse()
4: everything goes okay.

What my patch didn't fix is for following scene:

we have 1 nodes in the cluster (A), so

init: nr_cpg_nodes = 1, A is the master
	C joins
1: A, C get a join_messgae(C)
  for nr_cpg_nodes, A = 1, C = 0
2: A crashed before sending response
3: B, C get a leve_message(A)
  for nr_cpg_nodes, C = 0
  for is_master(), C = 0
  for join_finished C = 0
4: now C blocks for ever

By writing this example, I found that my next fix has something wrong
because the new joining node will be mistaken as master.

Thanks,
Yuan



More information about the sheepdog mailing list