[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request

Sat Apr 28 20:05:33 CEST 2012

On 04/29/2012 01:09 AM, Shevek wrote:

> On Sat, 2012-04-28 at 18:15 +0800, Liu Yuan wrote:
>> On 04/28/2012 05:58 PM, Shevek wrote:
>>
>>> Wrong. Read is_master():
>>>
>>>     243 static int is_master(struct cpg_node *node)
>>>     244 {
>>>     245     int i;
>>>     246     struct cpg_node *n = node;
>>>     247     if (!n)
>>>     248         n = &this_node;
>>>     249     if (nr_cpg_nodes == 0)
>>>     250         /* this node should be the first cpg node */
>>>     251         return 0;
>>>
>>> ^^^ This line is always hit because nr_cpg_nodes is always 0. Nobody set
>>> it to anything other than 0.
>>>
>>>     252 
>>>     253     for (i = 0; i < SD_MAX_NODES; i++) {
>>>     254         if (!cpg_nodes[i].gone)
>>>     255             break;
>>>     256     }
>>>     257 
>>>     258     if (cpg_node_equal(&cpg_nodes[i], n))
>>>     259         return i;
>>>     260     return -1;
>>>     261 }
>>>
>>> S.
>>
>>
>> I don't think so.
>>
>> see below example (with only my previous c4e3559758b2e fix)
>>
>> we have 2 nodes in the cluster (A, B), so
>>
>> init: nr_cpg_nodes = 2, A is the master
>> 	C joins
>> 1: A, B, C get a join_messgae(C)
>>   for nr_cpg_nodes, A = 2, B = 2, C = 0
> 
> Now is_master() returns 0 in C for any argument, so C elects itself
> master, and the patch is still wrong. Of course, since it isn't
> cpg_nodes[0], it doesn't stay master for long, but this is still wrong.
> 

I think I have clearly said B is elected to be the master, nothing to do
with C in this scenario. C is blocked until B send response to let C
join the cluster, is_master() is called in C is actually a NULL operation.

Actually, if there is any node in the cluster that is already in
finished_join state, there wouldn't be hung at all, that is what my
origin patch fixes already, for the real case.

But now what we are trying to solve, looks to me is quite artificial
unless you show me any real case for it.

> You can NOT call is_master() before cpg_nodes is set up correctly. Any
> patch using is_master() to fix this bug is automatically wrong.

> 
> Our patch handles this case correctly.

No, your patch is completely wrong workaround, not a fix. You just set
every node as 'first node' and all nodes become the master and send the
response to each other.

> 
>> 2: A crashed before sending response
>> 3: B, C get a leve_message(A)
>>   for nr_cpg_nodes, B = 2, C = 0
>>   for is_master(), B = 1, C = 0
>>   for join_finished, B = 1, C = 0
>>   so now B is elected to be master and responsible to send_reponse()
>> 4: everything goes okay.
>>
>> What my patch didn't fix is for following scene:
>>
>> we have 1 nodes in the cluster (A), so
>>
>> init: nr_cpg_nodes = 1, A is the master
>> 	C joins
>> 1: A, C get a join_messgae(C)
>>   for nr_cpg_nodes, A = 1, C = 0
>> 2: A crashed before sending response
>> 3: B, C get a leve_message(A)
>>   for nr_cpg_nodes, C = 0
>>   for is_master(), C = 0
>>   for join_finished C = 0
>> 4: now C blocks for ever
>>
>> By writing this example, I found that my next fix has something wrong
>> because the new joining node will be mistaken as master.
> 
> Yes, this is the case I was describing. Our patch handles this case
> correctly too.

As I said above, this is TOO ARTIFICIAL. If the cluster is set up, the
successive nodes joining won't cause any problem, at least
theoretically. If any, we'd love to solve this real case problem rather
than spend time on those made-up cases.

Actually, we have been testing around 1000 nodes with nearly 100 nodes
killing and joining in one go during heavy IO request in fly, we haven't
met hang problem.

And I think my RFC patch can solve the second case. If there is any
better patch that doesn't bring in any complex mechanism, I'd love to
merge any *reasonable* patch to handle those made-up scenes.

Thanks,
Yuan