On 04/28/2012 05:58 PM, Shevek wrote: > Wrong. Read is_master(): > > 243 static int is_master(struct cpg_node *node) > 244 { > 245 int i; > 246 struct cpg_node *n = node; > 247 if (!n) > 248 n = &this_node; > 249 if (nr_cpg_nodes == 0) > 250 /* this node should be the first cpg node */ > 251 return 0; > > ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set > it to anything other than 0. > > 252 > 253 for (i = 0; i < SD_MAX_NODES; i++) { > 254 if (!cpg_nodes[i].gone) > 255 break; > 256 } > 257 > 258 if (cpg_node_equal(&cpg_nodes[i], n)) > 259 return i; > 260 return -1; > 261 } > > S. I don't think so. see below example (with only my previous c4e3559758b2e fix) we have 2 nodes in the cluster (A, B), so init: nr_cpg_nodes = 2, A is the master C joins 1: A, B, C get a join_messgae(C) for nr_cpg_nodes, A = 2, B = 2, C = 0 2: A crashed before sending response 3: B, C get a leve_message(A) for nr_cpg_nodes, B = 2, C = 0 for is_master(), B = 1, C = 0 for join_finished, B = 1, C = 0 so now B is elected to be master and responsible to send_reponse() 4: everything goes okay. What my patch didn't fix is for following scene: we have 1 nodes in the cluster (A), so init: nr_cpg_nodes = 1, A is the master C joins 1: A, C get a join_messgae(C) for nr_cpg_nodes, A = 1, C = 0 2: A crashed before sending response 3: B, C get a leve_message(A) for nr_cpg_nodes, C = 0 for is_master(), C = 0 for join_finished C = 0 4: now C blocks for ever By writing this example, I found that my next fix has something wrong because the new joining node will be mistaken as master. Thanks, Yuan |