[Sheepdog] [PATCH v2] sheep: fix a network partition issue

Chaos Eternal chaoseternal at shlug.org
Tue Oct 25 09:09:46 CEST 2011


杯具的程序猿。。

On Tue, Oct 25, 2011 at 3:06 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
>
>> From: Yibin Shen <zituan at taobao.com>
>>
>> In some situation, sheep may disconnected from corosync instantaneously,
>> at the same time, both sheep and corosync will keep running but
>> none of them exit, then the disconnected sheep may receive a confchg
>> message from corosync which notify this sheep has left.
>> that will lead to a network partition, this patch fix it.
>>
>> Signed-off-by: Yibin Shen <zituan at taobao.com>
>> ---
>>  sheep/group.c |    3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/sheep/group.c b/sheep/group.c
>> index e22dabc..ab5a9f0 100644
>> --- a/sheep/group.c
>> +++ b/sheep/group.c
>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
>>       struct work_leave *w = NULL;
>>       int i, size;
>>
>> +     if (node_cmp(left, &sys->this_node) == 0)
>> +             panic("BUG: this node can't be on the left list\n");
>> +
>
>
> Hmm, the panic output looks confusing. how about "Network Patition Bug:
> I should have exited.\n"? since the output will be seen by
> administrators, not only programmer.
>
> Thanks,
> Yuan
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
>



More information about the sheepdog mailing list