[Sheepdog] [PATCH] sheep: remove cdrv_handlers and check_join_cb

Huxinwei huxinwei at huawei.com
Wed Apr 25 11:39:33 CEST 2012


> -----Original Message-----
> From: sheepdog-bounces at lists.wpkg.org
> [mailto:sheepdog-bounces at lists.wpkg.org] On Behalf Of Christoph Hellwig
> Sent: Wednesday, April 25, 2012 3:54 PM
> To: Liu Yuan
> Cc: sheepdog at lists.wpkg.org
> Subject: Re: [Sheepdog] [PATCH] sheep: remove cdrv_handlers and
> check_join_cb
> 
> On Wed, Apr 25, 2012 at 03:51:30PM +0800, Liu Yuan wrote:
> > I am more interested in how do you plan to deal with block_cb()? We
> > already meet some subtle problem that cluster gets hung at block state
> > for ever running a 1000 sheep daemon on dozen of machines, but not yet
> > come to any conclusion useful. We can only say that the block mechanism
> > would leave some holes to hang the whole cluster by only several minor
> > failed nodes (be it whether EIO-exiting or down).

What's the specific problem you had ?
There're several times I found that sheep fails to elect a master.
It turns out to be the first nodes failed before it unblocks other joining messages.
When it happened, you have to restart all sheeps to recover.

I thought it was corosync specific. Or there're more subtle issues there ?

> I haven't looked into a better scheme yet - I just identified that the
> area needs way more work than a simple cleanup, that's why I didn't
> touch it for now.
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list