[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request

Shevek shevek at anarres.org
Sat Apr 28 11:58:21 CEST 2012


On Sat, 2012-04-28 at 17:54 +0800, Liu Yuan wrote:
> On 04/28/2012 05:49 PM, Shevek wrote:
> 
> > On Sat, 2012-04-28 at 17:34 +0800, Liu Yuan wrote:
> >> On 04/28/2012 05:22 PM, Shevek wrote:
> >>
> >>>>> -               if (join_finished)
> >>>>> +               if (join_finished || is_master(&cevent->sender))
> >>> I believe this patch does NOT work:
> >>>
> >>> 1) is_master returns -1 for not-master, 0 means the given node is
> >>> master, so this is apparently a misuse of is_master.
> >>>
> >>> However, even worse:
> >>>
> >>> 2) If we never got a join response, then line 388:
> >>>     388 if (!cevent->blocked && cpg_node_equal(&cevent->sender,
> >>> &this_node)) {
> >>> never triggered, therefore we have never set join_finished, and
> >>> nr_cpg_nodes is always 0 (look for every place where it got set), so
> >>> is_master() is always 0.
> >>>
> >>> So, by inspection, this patch can never work, for both reasons.
> >>>
> >>
> >>
> >> Oh, yes. Thanks for pointing it out.
> >>
> >> But what if set set
> >>
> >>                 if (join_finished || (is_master(&cevent->sender >= 0)))
> >>                         done = __corosync_dispatch_one(cevent);
> >>                 else
> >>                         done = !cevent->blocked;
> >>
> >> Then, the master will be called for sure, and would give other nodes a
> >> join response.
> > 
> > As I said before under (2), is_master() will always return 0, since the
> > only assignments to nr_cpg_nodes are within __corosync_dispatch_one()
> > (which has never yet been called) or when first_node == 1 (which never
> > happened).
> > 
> > So this new patch is still invalid.
> > 
> 
> 
> confchg event will be broadcast to all the nodes.
> 
> for our case, if is_master() returns >=0, it means the very node *is*
> the master, then will call into __corosync_dispatch_one[1], where then
> it will call into send_message(response).
> 
> so your assumption for (2) isn't valid.

Wrong. Read is_master():

    243 static int is_master(struct cpg_node *node)
    244 {
    245     int i;
    246     struct cpg_node *n = node;
    247     if (!n)
    248         n = &this_node;
    249     if (nr_cpg_nodes == 0)
    250         /* this node should be the first cpg node */
    251         return 0;

^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set
it to anything other than 0.

    252 
    253     for (i = 0; i < SD_MAX_NODES; i++) {
    254         if (!cpg_nodes[i].gone)
    255             break;
    256     }
    257 
    258     if (cpg_node_equal(&cpg_nodes[i], n))
    259         return i;
    260     return -1;
    261 }

S.

> [1]
> if (join_finished || (is_master(&cevent->sender >= 0)))
> 	done = __corosync_dispatch_one(cevent);
> 
> Thanks,
> Yuan
> 





More information about the sheepdog mailing list