[Sheepdog] [PATCH] sheep: don't exit when sheep calls leave_cluster()
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Thu Nov 24 17:24:23 CET 2011
At Thu, 24 Nov 2011 14:35:03 +0900,
MORITA Kazutaka wrote:
>
> At Thu, 24 Nov 2011 11:55:01 +0800,
> Liu Yuan wrote:
> >
> > From: Liu Yuan <tailai.ly at taobao.com>
> >
> > When some unrecoverable error happens, sheep daemon will leave the cluster but stay
> > as a gate to redirect requests.
> >
> > For e.g, fllowing case is sheep meets an EIO
> > ...
> > Nov 24 10:36:15 do_io_request(785) failed: 2, 2, 7c2b2500000000 , 1, 3
> > Nov 24 10:36:15 io_op_done(147) leaving sheepdog cluster
> > Nov 24 10:36:15 sd_leave_handler(1291) network partition bug: this sheep should have exited
> > Nov 24 10:36:15 log_sigsegv(358) logger pid 8255 exiting abnormally
> > ...
> >
> > Thit has nothing to do with network partition stuff.
> >
> > Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> > ---
> > sheep/group.c | 3 ---
> > 1 files changed, 0 insertions(+), 3 deletions(-)
> >
> > diff --git a/sheep/group.c b/sheep/group.c
> > index f126de5..31d1f76 100644
> > --- a/sheep/group.c
> > +++ b/sheep/group.c
> > @@ -1287,9 +1287,6 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
> > struct work_leave *w = NULL;
> > int i, size;
> >
> > - if (node_cmp(left, &sys->this_node) == 0)
> > - panic("network partition bug: this sheep should have exited\n");
> > -
> > dprintf("leave %s\n", node_to_str(left));
> > for (i = 0; i < nr_members; i++)
> > dprintf("[%x] %s\n", i, node_to_str(members + i));
>
> It is better to stop calling join/leave handlers after the node leaves
> the cluster. It is the way Sheepdog did before introducing a cluster
> driver.
Sorry, I was wrong. This patch is correct because the gateway needs
to receive join/leave notifications to update the consistent hash
ring.
Applied this patch, thanks!
Thanks,
Kazutaka
More information about the sheepdog
mailing list