[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Tue Nov 1 12:18:00 CET 2011

Hi guys,

Sorry for my late.

This issue looks like corosync's bug, I'am going to solve it.

The reason is that when one of corosync run into 'FAIL' state, it add
itself to failed_list, and multicast JoinMsg containing this
failed_list to the cluster. Normally, Other corosyncs will not receive
this JoinMsg because network partition occur, and this issue will not
be triggered. But if the network recover so quickly result in that
other corosyncs can receive this JoinMsg, then all of corosyncs will
form a new ring with this failed_list. The result is that the new ring
will create and send a confchg(not including one of corosync) to each
sheep.

On Mon, Oct 31, 2011 at 8:28 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 10/31/2011 07:37 PM, Liu Yuan wrote:
>
>> On 10/31/2011 07:10 PM, MORITA Kazutaka wrote:
>>
>>> At Mon, 31 Oct 2011 18:15:06 +0800,
>>> Liu Yuan wrote:
>>>>
>>>> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote:
>>>>
>>>>> At Tue, 25 Oct 2011 15:06:05 +0800,
>>>>> Liu Yuan wrote:
>>>>>>
>>>>>> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
>>>>>>
>>>>>>> From: Yibin Shen <zituan at taobao.com>
>>>>>>>
>>>>>>> In some situation, sheep may disconnected from corosync instantaneously,
>>>>>>> at the same time, both sheep and corosync will keep running but
>>>>>>> none of them exit, then the disconnected sheep may receive a confchg
>>>>>>> message from corosync which notify this sheep has left.
>>>>>>> that will lead to a network partition, this patch fix it.
>>>>>>>
>>>>>>> Signed-off-by: Yibin Shen <zituan at taobao.com>
>>>>>>> ---
>>>>>>>  sheep/group.c |    3 +++
>>>>>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/sheep/group.c b/sheep/group.c
>>>>>>> index e22dabc..ab5a9f0 100644
>>>>>>> --- a/sheep/group.c
>>>>>>> +++ b/sheep/group.c
>>>>>>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
>>>>>>>          struct work_leave *w = NULL;
>>>>>>>          int i, size;
>>>>>>>
>>>>>>> +        if (node_cmp(left, &sys->this_node) == 0)
>>>>>>> +                panic("BUG: this node can't be on the left list\n");
>>>>>>> +
>>>>>>
>>>>>>
>>>>>> Hmm, the panic output looks confusing. how about "Network Patition Bug:
>>>>>> I should have exited.\n"? since the output will be seen by
>>>>>> administrators, not only programmer.
>>>>>
>>>>> Applied after modifying output text, thanks!
>>>>>
>>>>> Kazutaka
>>>>
>>>>
>>>> Kazutaka,
>>>>     Maybe we should not panic out when it becomes a single node cluster.
>>>> The node will change into HALT state which doesn't any harm to its data.
>>>
>>> It is much better.  Currently, Sheepdog kills a minority cluster in
>>> __sd_leave() when network partition occurs because it is the simplest
>>> solution to keep data consistency.
>>>
>>> But this looks a different issue from this patch.  Does corosync
>>> include local node in the left list when network partition occurs?  If
>>> so, we should handle it in the corosync cluster driver because it
>>> looks a corosync specific issue to me.
>>>
>>
>> I am not sure, but if corosync include local node in the left list, it
>> should be a bug in corosync.
>>
>> let's assume (a,b,c) three nodes.
>> I am suspecting that that left message is for n(b,c), but after n(a)
>> rejoins, for whatever reason, the message is being broadcasting, and
>> n(a) just gets it wrongly.
>>
>> Yunkai's going to patch corosync for related bugs during corosync node
>> leave & rejoin the old ring. With those noted bug fixes, let's see if
>> this kind of problem ( node receive a msg that itself is left) would
>> still exist.
>>
>
>
> Yunkai,
>        would you please shed some light on this issue?
>
> Thanks,
> Yuan
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
>

-- 
Yunkai Zhang
Work at Taobao