On 10/31/2011 06:00 PM, MORITA Kazutaka wrote: > At Tue, 25 Oct 2011 15:06:05 +0800, > Liu Yuan wrote: >> >> On 10/25/2011 02:55 PM, zituan at taobao.com wrote: >> >>> From: Yibin Shen <zituan at taobao.com> >>> >>> In some situation, sheep may disconnected from corosync instantaneously, >>> at the same time, both sheep and corosync will keep running but >>> none of them exit, then the disconnected sheep may receive a confchg >>> message from corosync which notify this sheep has left. >>> that will lead to a network partition, this patch fix it. >>> >>> Signed-off-by: Yibin Shen <zituan at taobao.com> >>> --- >>> sheep/group.c | 3 +++ >>> 1 files changed, 3 insertions(+), 0 deletions(-) >>> >>> diff --git a/sheep/group.c b/sheep/group.c >>> index e22dabc..ab5a9f0 100644 >>> --- a/sheep/group.c >>> +++ b/sheep/group.c >>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left, >>> struct work_leave *w = NULL; >>> int i, size; >>> >>> + if (node_cmp(left, &sys->this_node) == 0) >>> + panic("BUG: this node can't be on the left list\n"); >>> + >> >> >> Hmm, the panic output looks confusing. how about "Network Patition Bug: >> I should have exited.\n"? since the output will be seen by >> administrators, not only programmer. > > Applied after modifying output text, thanks! > > Kazutaka Kazutaka, Maybe we should not panic out when it becomes a single node cluster. The node will change into HALT state which doesn't any harm to its data. This is introduced by a corosync bug which is fixed by Yunkai in latest corosync. During patch fixing, Yunkai found that corosync will likely re-join the old configuration soon after a short break-out with other corosync nodes. for example, (a,b,c) is a corosync ring. for some time n(c) breaks out, and becomes a single ring itself.later, n(c) rejoins (a,b,c) -> (a,b), (c) -> (a,b,c) | | confchg1 confchg2 so the question is , do we have to panic out n(c) at confchg1 in this case? since n(c) does no harm to data, after confchg2, I think, IIUC, n(c) will see the view as (a,b,c). no? Thanks, Yuan |