On Sat, 2012-04-28 at 18:15 +0800, Liu Yuan wrote: > On 04/28/2012 05:58 PM, Shevek wrote: > > > Wrong. Read is_master(): > > > > 243 static int is_master(struct cpg_node *node) > > 244 { > > 245 int i; > > 246 struct cpg_node *n = node; > > 247 if (!n) > > 248 n = &this_node; > > 249 if (nr_cpg_nodes == 0) > > 250 /* this node should be the first cpg node */ > > 251 return 0; > > > > ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set > > it to anything other than 0. > > > > 252 > > 253 for (i = 0; i < SD_MAX_NODES; i++) { > > 254 if (!cpg_nodes[i].gone) > > 255 break; > > 256 } > > 257 > > 258 if (cpg_node_equal(&cpg_nodes[i], n)) > > 259 return i; > > 260 return -1; > > 261 } > > > > S. > > > I don't think so. > > see below example (with only my previous c4e3559758b2e fix) > > we have 2 nodes in the cluster (A, B), so > > init: nr_cpg_nodes = 2, A is the master > C joins > 1: A, B, C get a join_messgae(C) > for nr_cpg_nodes, A = 2, B = 2, C = 0 Now is_master() returns 0 in C for any argument, so C elects itself master, and the patch is still wrong. Of course, since it isn't cpg_nodes[0], it doesn't stay master for long, but this is still wrong. You can NOT call is_master() before cpg_nodes is set up correctly. Any patch using is_master() to fix this bug is automatically wrong. Our patch handles this case correctly. > 2: A crashed before sending response > 3: B, C get a leve_message(A) > for nr_cpg_nodes, B = 2, C = 0 > for is_master(), B = 1, C = 0 > for join_finished, B = 1, C = 0 > so now B is elected to be master and responsible to send_reponse() > 4: everything goes okay. > > What my patch didn't fix is for following scene: > > we have 1 nodes in the cluster (A), so > > init: nr_cpg_nodes = 1, A is the master > C joins > 1: A, C get a join_messgae(C) > for nr_cpg_nodes, A = 1, C = 0 > 2: A crashed before sending response > 3: B, C get a leve_message(A) > for nr_cpg_nodes, C = 0 > for is_master(), C = 0 > for join_finished C = 0 > 4: now C blocks for ever > > By writing this example, I found that my next fix has something wrong > because the new joining node will be mistaken as master. Yes, this is the case I was describing. Our patch handles this case correctly too. S. |