[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request

Liu Yuan namei.unix at gmail.com
Sat Apr 28 11:58:44 CEST 2012


On 04/28/2012 05:54 PM, Liu Yuan wrote:

> On 04/28/2012 05:49 PM, Shevek wrote:
> 
>> On Sat, 2012-04-28 at 17:34 +0800, Liu Yuan wrote:
>>> On 04/28/2012 05:22 PM, Shevek wrote:
>>>
>>>>>> -               if (join_finished)
>>>>>> +               if (join_finished || is_master(&cevent->sender))
>>>> I believe this patch does NOT work:
>>>>
>>>> 1) is_master returns -1 for not-master, 0 means the given node is
>>>> master, so this is apparently a misuse of is_master.
>>>>
>>>> However, even worse:
>>>>
>>>> 2) If we never got a join response, then line 388:
>>>>     388 if (!cevent->blocked && cpg_node_equal(&cevent->sender,
>>>> &this_node)) {
>>>> never triggered, therefore we have never set join_finished, and
>>>> nr_cpg_nodes is always 0 (look for every place where it got set), so
>>>> is_master() is always 0.
>>>>
>>>> So, by inspection, this patch can never work, for both reasons.
>>>>
>>>
>>>
>>> Oh, yes. Thanks for pointing it out.
>>>
>>> But what if set set
>>>
>>>                 if (join_finished || (is_master(&cevent->sender >= 0)))
>>>                         done = __corosync_dispatch_one(cevent);
>>>                 else
>>>                         done = !cevent->blocked;
>>>
>>> Then, the master will be called for sure, and would give other nodes a
>>> join response.
>>
>> As I said before under (2), is_master() will always return 0, since the
>> only assignments to nr_cpg_nodes are within __corosync_dispatch_one()
>> (which has never yet been called) or when first_node == 1 (which never
>> happened).
>>
>> So this new patch is still invalid.
>>
> 
> 
> confchg event will be broadcast to all the nodes.
> 
> for our case, if is_master() returns >=0, it means the very node *is*
> the master, then will call into __corosync_dispatch_one[1], where then
> it will call into send_message(response).
> 
> so your assumption for (2) isn't valid.
> 
> [1]
> if (join_finished || (is_master(&cevent->sender >= 0)))
> 	done = __corosync_dispatch_one(cevent);
> 
> Thanks,
> Yuan


Typo, should be  (is_master(&cevent->sender) >= 0)



More information about the sheepdog mailing list