[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request
Shevek
shevek at anarres.org
Sat Apr 28 19:09:26 CEST 2012
On Sat, 2012-04-28 at 18:15 +0800, Liu Yuan wrote:
> On 04/28/2012 05:58 PM, Shevek wrote:
>
> > Wrong. Read is_master():
> >
> > 243 static int is_master(struct cpg_node *node)
> > 244 {
> > 245 int i;
> > 246 struct cpg_node *n = node;
> > 247 if (!n)
> > 248 n = &this_node;
> > 249 if (nr_cpg_nodes == 0)
> > 250 /* this node should be the first cpg node */
> > 251 return 0;
> >
> > ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set
> > it to anything other than 0.
> >
> > 252
> > 253 for (i = 0; i < SD_MAX_NODES; i++) {
> > 254 if (!cpg_nodes[i].gone)
> > 255 break;
> > 256 }
> > 257
> > 258 if (cpg_node_equal(&cpg_nodes[i], n))
> > 259 return i;
> > 260 return -1;
> > 261 }
> >
> > S.
>
>
> I don't think so.
>
> see below example (with only my previous c4e3559758b2e fix)
>
> we have 2 nodes in the cluster (A, B), so
>
> init: nr_cpg_nodes = 2, A is the master
> C joins
> 1: A, B, C get a join_messgae(C)
> for nr_cpg_nodes, A = 2, B = 2, C = 0
Now is_master() returns 0 in C for any argument, so C elects itself
master, and the patch is still wrong. Of course, since it isn't
cpg_nodes[0], it doesn't stay master for long, but this is still wrong.
You can NOT call is_master() before cpg_nodes is set up correctly. Any
patch using is_master() to fix this bug is automatically wrong.
Our patch handles this case correctly.
> 2: A crashed before sending response
> 3: B, C get a leve_message(A)
> for nr_cpg_nodes, B = 2, C = 0
> for is_master(), B = 1, C = 0
> for join_finished, B = 1, C = 0
> so now B is elected to be master and responsible to send_reponse()
> 4: everything goes okay.
>
> What my patch didn't fix is for following scene:
>
> we have 1 nodes in the cluster (A), so
>
> init: nr_cpg_nodes = 1, A is the master
> C joins
> 1: A, C get a join_messgae(C)
> for nr_cpg_nodes, A = 1, C = 0
> 2: A crashed before sending response
> 3: B, C get a leve_message(A)
> for nr_cpg_nodes, C = 0
> for is_master(), C = 0
> for join_finished C = 0
> 4: now C blocks for ever
>
> By writing this example, I found that my next fix has something wrong
> because the new joining node will be mistaken as master.
Yes, this is the case I was describing. Our patch handles this case
correctly too.
S.
More information about the sheepdog
mailing list