[Sheepdog] [PATCH v2] sheep: fix a network partition issue
Liu Yuan
namei.unix at gmail.com
Tue Oct 25 09:06:05 CEST 2011
On 10/25/2011 02:55 PM, zituan at taobao.com wrote:
> From: Yibin Shen <zituan at taobao.com>
>
> In some situation, sheep may disconnected from corosync instantaneously,
> at the same time, both sheep and corosync will keep running but
> none of them exit, then the disconnected sheep may receive a confchg
> message from corosync which notify this sheep has left.
> that will lead to a network partition, this patch fix it.
>
> Signed-off-by: Yibin Shen <zituan at taobao.com>
> ---
> sheep/group.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/sheep/group.c b/sheep/group.c
> index e22dabc..ab5a9f0 100644
> --- a/sheep/group.c
> +++ b/sheep/group.c
> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
> struct work_leave *w = NULL;
> int i, size;
>
> + if (node_cmp(left, &sys->this_node) == 0)
> + panic("BUG: this node can't be on the left list\n");
> +
Hmm, the panic output looks confusing. how about "Network Patition Bug:
I should have exited.\n"? since the output will be seen by
administrators, not only programmer.
Thanks,
Yuan
More information about the sheepdog
mailing list