[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Mon Oct 31 14:44:34 CET 2011

IMHO,

In case of cluster parttioning happened, we should introduce some
STONITH techniques to avoid data corruption.
generally, the step is as following:
1. partitioning detected
2. wait some interval to confirm the partition
3. vote to STONITH

STONITH can be truly shut the node down, or just seize the running of
those nodes. we can discuss further.

On Mon, Oct 31, 2011 at 8:36 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 10/31/2011 07:49 PM, MORITA Kazutaka wrote:
>
>> At Mon, 31 Oct 2011 19:37:51 +0800,
>> Liu Yuan wrote:
>>>
>>> On 10/31/2011 07:10 PM, MORITA Kazutaka wrote:
>>>
>>>> At Mon, 31 Oct 2011 18:15:06 +0800,
>>>> Liu Yuan wrote:
>>>>>
>>>>> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote:
>>>>>
>>>>>> At Tue, 25 Oct 2011 15:06:05 +0800,
>>>>>> Liu Yuan wrote:
>>>>>>>
>>>>>>> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
>>>>>>>
>>>>>>>> From: Yibin Shen <zituan at taobao.com>
>>>>>>>>
>>>>>>>> In some situation, sheep may disconnected from corosync instantaneously,
>>>>>>>> at the same time, both sheep and corosync will keep running but
>>>>>>>> none of them exit, then the disconnected sheep may receive a confchg
>>>>>>>> message from corosync which notify this sheep has left.
>>>>>>>> that will lead to a network partition, this patch fix it.
>>>>>>>>
>>>>>>>> Signed-off-by: Yibin Shen <zituan at taobao.com>
>>>>>>>> ---
>>>>>>>>  sheep/group.c |    3 +++
>>>>>>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/sheep/group.c b/sheep/group.c
>>>>>>>> index e22dabc..ab5a9f0 100644
>>>>>>>> --- a/sheep/group.c
>>>>>>>> +++ b/sheep/group.c
>>>>>>>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
>>>>>>>>         struct work_leave *w = NULL;
>>>>>>>>         int i, size;
>>>>>>>>
>>>>>>>> +       if (node_cmp(left, &sys->this_node) == 0)
>>>>>>>> +               panic("BUG: this node can't be on the left list\n");
>>>>>>>> +
>>>>>>>
>>>>>>>
>>>>>>> Hmm, the panic output looks confusing. how about "Network Patition Bug:
>>>>>>> I should have exited.\n"? since the output will be seen by
>>>>>>> administrators, not only programmer.
>>>>>>
>>>>>> Applied after modifying output text, thanks!
>>>>>>
>>>>>> Kazutaka
>>>>>
>>>>>
>>>>> Kazutaka,
>>>>>    Maybe we should not panic out when it becomes a single node cluster.
>>>>> The node will change into HALT state which doesn't any harm to its data.
>>>>
>>>> It is much better.  Currently, Sheepdog kills a minority cluster in
>>>> __sd_leave() when network partition occurs because it is the simplest
>>>> solution to keep data consistency.
>>>>
>>>> But this looks a different issue from this patch.  Does corosync
>>>> include local node in the left list when network partition occurs?  If
>>>> so, we should handle it in the corosync cluster driver because it
>>>> looks a corosync specific issue to me.
>>>>
>>>
>>> I am not sure, but if corosync include local node in the left list, it
>>> should be a bug in corosync.
>>>
>>> let's assume (a,b,c) three nodes.
>>> I am suspecting that that left message is for n(b,c), but after n(a)
>>> rejoins, for whatever reason, the message is being broadcasting, and
>>> n(a) just gets it wrongly.
>>
>> IIUC, n(a) should receive the left massage of n(b,c).
>
> Yes, but n(a) should not receive the leave_message(a) which is intended
> for n(b,c).
>
> so the correct sequence should be:
>
> network partition happens
> lm(a) -> n(b,c), lm(b,c) -> n(a).
> then n(a) rejoins.
> jm(a) -> n(a,b,c)
>
> anyway, I am not sure, cause I didn't look at the log.
>
> Thanks,
> Yuan
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
>