On Thu, Apr 26, 2012 at 6:18 PM, Huxinwei <huxinwei at huawei.com> wrote: >> -----Original Message----- >> From: Liu Yuan [mailto:namei.unix at gmail.com] >> Sent: Thursday, April 26, 2012 12:06 AM >> To: Huxinwei >> Cc: Christoph Hellwig; sheepdog at lists.wpkg.org >> Subject: Re: [Sheepdog] [PATCH] sheep: remove cdrv_handlers and >> check_join_cb >> >> On 04/25/2012 05:39 PM, Huxinwei wrote: >> > What's the specific problem you had ? >> > There're several times I found that sheep fails to elect a master. >> > It turns out to be the first nodes failed before it unblocks other joining >> messages. >> > When it happened, you have to restart all sheeps to recover. >> >> I don't think this would happen for corosync. There is a tested mechanism to >> transfer mastership >> for this very case. > But it does happen.. :P I guess there is a bug; if the master node crashes before sending a join response, the next master need to send the join response instead of the previous master to unblock the cevent queue. Thanks, Kazutaka |