[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Mon Oct 31 13:28:54 CET 2011

On 10/31/2011 07:37 PM, Liu Yuan wrote:

> On 10/31/2011 07:10 PM, MORITA Kazutaka wrote:
> 
>> At Mon, 31 Oct 2011 18:15:06 +0800,
>> Liu Yuan wrote:
>>>
>>> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote:
>>>
>>>> At Tue, 25 Oct 2011 15:06:05 +0800,
>>>> Liu Yuan wrote:
>>>>>
>>>>> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
>>>>>
>>>>>> From: Yibin Shen <zituan at taobao.com>
>>>>>>
>>>>>> In some situation, sheep may disconnected from corosync instantaneously,
>>>>>> at the same time, both sheep and corosync will keep running but
>>>>>> none of them exit, then the disconnected sheep may receive a confchg
>>>>>> message from corosync which notify this sheep has left.
>>>>>> that will lead to a network partition, this patch fix it.
>>>>>>
>>>>>> Signed-off-by: Yibin Shen <zituan at taobao.com>
>>>>>> ---
>>>>>>  sheep/group.c |    3 +++
>>>>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/sheep/group.c b/sheep/group.c
>>>>>> index e22dabc..ab5a9f0 100644
>>>>>> --- a/sheep/group.c
>>>>>> +++ b/sheep/group.c
>>>>>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
>>>>>>  	struct work_leave *w = NULL;
>>>>>>  	int i, size;
>>>>>>  
>>>>>> +	if (node_cmp(left, &sys->this_node) == 0)
>>>>>> +		panic("BUG: this node can't be on the left list\n");
>>>>>> +
>>>>>
>>>>>
>>>>> Hmm, the panic output looks confusing. how about "Network Patition Bug:
>>>>> I should have exited.\n"? since the output will be seen by
>>>>> administrators, not only programmer.
>>>>
>>>> Applied after modifying output text, thanks!
>>>>
>>>> Kazutaka
>>>
>>>
>>> Kazutaka,
>>> 	Maybe we should not panic out when it becomes a single node cluster.
>>> The node will change into HALT state which doesn't any harm to its data.
>>
>> It is much better.  Currently, Sheepdog kills a minority cluster in
>> __sd_leave() when network partition occurs because it is the simplest
>> solution to keep data consistency.
>>
>> But this looks a different issue from this patch.  Does corosync
>> include local node in the left list when network partition occurs?  If
>> so, we should handle it in the corosync cluster driver because it
>> looks a corosync specific issue to me.
>>
> 
> I am not sure, but if corosync include local node in the left list, it
> should be a bug in corosync.
> 
> let's assume (a,b,c) three nodes.
> I am suspecting that that left message is for n(b,c), but after n(a)
> rejoins, for whatever reason, the message is being broadcasting, and
> n(a) just gets it wrongly.
> 
> Yunkai's going to patch corosync for related bugs during corosync node
> leave & rejoin the old ring. With those noted bug fixes, let's see if
> this kind of problem ( node receive a msg that itself is left) would
> still exist.
> 

Yunkai,
	would you please shed some light on this issue?

Thanks,
Yuan