On 10/25/2011 02:55 PM, zituan at taobao.com wrote: > From: Yibin Shen <zituan at taobao.com> > > In some situation, sheep may disconnected from corosync instantaneously, > at the same time, both sheep and corosync will keep running but > none of them exit, then the disconnected sheep may receive a confchg > message from corosync which notify this sheep has left. > that will lead to a network partition, this patch fix it. > > Signed-off-by: Yibin Shen <zituan at taobao.com> > --- > sheep/group.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/sheep/group.c b/sheep/group.c > index e22dabc..ab5a9f0 100644 > --- a/sheep/group.c > +++ b/sheep/group.c > @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left, > struct work_leave *w = NULL; > int i, size; > > + if (node_cmp(left, &sys->this_node) == 0) > + panic("BUG: this node can't be on the left list\n"); > + Hmm, the panic output looks confusing. how about "Network Patition Bug: I should have exited.\n"? since the output will be seen by administrators, not only programmer. Thanks, Yuan |