[Sheepdog] [PATCH] sheep: don't exit when sheep calls leave_cluster()

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Nov 24 17:24:23 CET 2011


At Thu, 24 Nov 2011 14:35:03 +0900,
MORITA Kazutaka wrote:
> 
> At Thu, 24 Nov 2011 11:55:01 +0800,
> Liu Yuan wrote:
> > 
> > From: Liu Yuan <tailai.ly at taobao.com>
> > 
> > When some unrecoverable error happens, sheep daemon will leave the cluster but stay
> > as a gate to redirect requests.
> > 
> > For e.g, fllowing case is sheep meets an EIO
> > ...
> > Nov 24 10:36:15 do_io_request(785) failed: 2, 2, 7c2b2500000000 , 1, 3
> > Nov 24 10:36:15 io_op_done(147) leaving sheepdog cluster
> > Nov 24 10:36:15 sd_leave_handler(1291) network partition bug: this sheep should have exited
> > Nov 24 10:36:15 log_sigsegv(358) logger pid 8255 exiting abnormally
> > ...
> > 
> > Thit has nothing to do with network partition stuff.
> > 
> > Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> > ---
> >  sheep/group.c |    3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> > 
> > diff --git a/sheep/group.c b/sheep/group.c
> > index f126de5..31d1f76 100644
> > --- a/sheep/group.c
> > +++ b/sheep/group.c
> > @@ -1287,9 +1287,6 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left,
> >  	struct work_leave *w = NULL;
> >  	int i, size;
> >  
> > -	if (node_cmp(left, &sys->this_node) == 0)
> > -		panic("network partition bug: this sheep should have exited\n");
> > -
> >  	dprintf("leave %s\n", node_to_str(left));
> >  	for (i = 0; i < nr_members; i++)
> >  		dprintf("[%x] %s\n", i, node_to_str(members + i));
> 
> It is better to stop calling join/leave handlers after the node leaves
> the cluster.  It is the way Sheepdog did before introducing a cluster
> driver.

Sorry, I was wrong.  This patch is correct because the gateway needs
to receive join/leave notifications to update the consistent hash
ring.

Applied this patch, thanks!

Thanks,

Kazutaka



More information about the sheepdog mailing list