[sheepdog] [PATCH v7 3/7] sheep: rejoin cluster after a zookeeper session timeout

Wed Jun 26 03:28:43 CEST 2013

At Wed, 26 Jun 2013 09:12:06 +0800,
Kai Zhang wrote:
> 
> [1  <text/plain; us-ascii (quoted-printable)>]
> 
> On Jun 25, 2013, at 11:06 PM, Hitoshi Mitake <mitake.hitoshi at gmail.com> wrote:
> 
> > As you say, the rejoin would be an only way to handle session timeout
> > correctly. But the current zookeeper driver produces serious problems
> > when network failures happen (e.g. inconsistent epochs).
> > 
> > So I believe the panic() or exit() would be better than doing
> > nothing. If sheeps with zookeeper driver exits immediately in the
> > above case, we can restart sheeps manually.
> > # I understand this solution goes against the policy of sheepdog... :(
> > 
> 
> I see. Do you mean a separate patch based on upstream? or based on 
> PATCH 1/7 and 2/7?
> 
> Because these patches have been reviewed by Kazutaka and Yuan, 
> I think they will be merged soon after some minor modifications.

I think the 1 - 5 would be a good individual patchset.

> 
> Would you mind that we merge the whole series to the stable branch later?
> 

Of course. Your zookeeper improvement is a very important thing for
safe operation of sheepdog.

> > And our internal team needs the solution until this Thursday (we have
> > a local change for this problem but it is a temporal and dirty
> > thing). If you can help us, I'm very happy :)
> 
> Our team is also waiting for this patch for a long time :)

:)

Thanks,
Hitoshi