[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Liu Yuan namei.unix at gmail.com
Mon Oct 31 12:37:51 CET 2011


On 10/31/2011 07:10 PM, MORITA Kazutaka wrote:

> At Mon, 31 Oct 2011 18:15:06 +0800,
> Liu Yuan wrote:
>>
>> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote:
>>
>>> At Tue, 25 Oct 2011 15:06:05 +0800,
>>> Liu Yuan wrote:
>>>>
>>>> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
>>>>
>>>>> From: Yibin Shen <zituan at taobao.com>
>>>>>
>>>>> In some situation, sheep may disconnected from corosync instantaneously,
>>>>> at the same time, both sheep and corosync will keep running but
>>>>> none of them exit, then the disconnected sheep may receive a confchg
>>>>> message from corosync which notify this sheep has left.
>>>>> that will lead to a network partition, this patch fix it.
>>>>>
>>>>> Signed-off-by: Yibin Shen <zituan at taobao.com>
>>>>> ---
>>>>>  sheep/group.c |    3 +++
>>>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/sheep/group.c b/sheep/group.c
>>>>> index e22dabc..ab5a9f0 100644
>>>>> --- a/sheep/group.c
>>>>> +++ b/sheep/group.c
>>>>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
>>>>>  	struct work_leave *w = NULL;
>>>>>  	int i, size;
>>>>>  
>>>>> +	if (node_cmp(left, &sys->this_node) == 0)
>>>>> +		panic("BUG: this node can't be on the left list\n");
>>>>> +
>>>>
>>>>
>>>> Hmm, the panic output looks confusing. how about "Network Patition Bug:
>>>> I should have exited.\n"? since the output will be seen by
>>>> administrators, not only programmer.
>>>
>>> Applied after modifying output text, thanks!
>>>
>>> Kazutaka
>>
>>
>> Kazutaka,
>> 	Maybe we should not panic out when it becomes a single node cluster.
>> The node will change into HALT state which doesn't any harm to its data.
> 
> It is much better.  Currently, Sheepdog kills a minority cluster in
> __sd_leave() when network partition occurs because it is the simplest
> solution to keep data consistency.
> 
> But this looks a different issue from this patch.  Does corosync
> include local node in the left list when network partition occurs?  If
> so, we should handle it in the corosync cluster driver because it
> looks a corosync specific issue to me.
> 

I am not sure, but if corosync include local node in the left list, it
should be a bug in corosync.

let's assume (a,b,c) three nodes.
I am suspecting that that left message is for n(b,c), but after n(a)
rejoins, for whatever reason, the message is being broadcasting, and
n(a) just gets it wrongly.

Yunkai's going to patch corosync for related bugs during corosync node
leave & rejoin the old ring. With those noted bug fixes, let's see if
this kind of problem ( node receive a msg that itself is left) would
still exist.

Thanks,
Yuan



More information about the sheepdog mailing list