On 04/29/2012 01:09 AM, Shevek wrote: > On Sat, 2012-04-28 at 18:15 +0800, Liu Yuan wrote: >> On 04/28/2012 05:58 PM, Shevek wrote: >> >>> Wrong. Read is_master(): >>> >>> 243 static int is_master(struct cpg_node *node) >>> 244 { >>> 245 int i; >>> 246 struct cpg_node *n = node; >>> 247 if (!n) >>> 248 n = &this_node; >>> 249 if (nr_cpg_nodes == 0) >>> 250 /* this node should be the first cpg node */ >>> 251 return 0; >>> >>> ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set >>> it to anything other than 0. >>> >>> 252 >>> 253 for (i = 0; i < SD_MAX_NODES; i++) { >>> 254 if (!cpg_nodes[i].gone) >>> 255 break; >>> 256 } >>> 257 >>> 258 if (cpg_node_equal(&cpg_nodes[i], n)) >>> 259 return i; >>> 260 return -1; >>> 261 } >>> >>> S. >> >> >> I don't think so. >> >> see below example (with only my previous c4e3559758b2e fix) >> >> we have 2 nodes in the cluster (A, B), so >> >> init: nr_cpg_nodes = 2, A is the master >> C joins >> 1: A, B, C get a join_messgae(C) >> for nr_cpg_nodes, A = 2, B = 2, C = 0 > > Now is_master() returns 0 in C for any argument, so C elects itself > master, and the patch is still wrong. Of course, since it isn't > cpg_nodes[0], it doesn't stay master for long, but this is still wrong. > I think I have clearly said B is elected to be the master, nothing to do with C in this scenario. C is blocked until B send response to let C join the cluster, is_master() is called in C is actually a NULL operation. Actually, if there is any node in the cluster that is already in finished_join state, there wouldn't be hung at all, that is what my origin patch fixes already, for the real case. But now what we are trying to solve, looks to me is quite artificial unless you show me any real case for it. > You can NOT call is_master() before cpg_nodes is set up correctly. Any > patch using is_master() to fix this bug is automatically wrong. > > Our patch handles this case correctly. No, your patch is completely wrong workaround, not a fix. You just set every node as 'first node' and all nodes become the master and send the response to each other. > >> 2: A crashed before sending response >> 3: B, C get a leve_message(A) >> for nr_cpg_nodes, B = 2, C = 0 >> for is_master(), B = 1, C = 0 >> for join_finished, B = 1, C = 0 >> so now B is elected to be master and responsible to send_reponse() >> 4: everything goes okay. >> >> What my patch didn't fix is for following scene: >> >> we have 1 nodes in the cluster (A), so >> >> init: nr_cpg_nodes = 1, A is the master >> C joins >> 1: A, C get a join_messgae(C) >> for nr_cpg_nodes, A = 1, C = 0 >> 2: A crashed before sending response >> 3: B, C get a leve_message(A) >> for nr_cpg_nodes, C = 0 >> for is_master(), C = 0 >> for join_finished C = 0 >> 4: now C blocks for ever >> >> By writing this example, I found that my next fix has something wrong >> because the new joining node will be mistaken as master. > > Yes, this is the case I was describing. Our patch handles this case > correctly too. As I said above, this is TOO ARTIFICIAL. If the cluster is set up, the successive nodes joining won't cause any problem, at least theoretically. If any, we'd love to solve this real case problem rather than spend time on those made-up cases. Actually, we have been testing around 1000 nodes with nearly 100 nodes killing and joining in one go during heavy IO request in fly, we haven't met hang problem. And I think my RFC patch can solve the second case. If there is any better patch that doesn't bring in any complex mechanism, I'd love to merge any *reasonable* patch to handle those made-up scenes. Thanks, Yuan |