[sheepdog] [PATCH v2] zookeeper: exit program on unrecoverable error

MORITA Kazutaka morita.kazutaka at gmail.com
Mon Jun 17 02:07:19 CEST 2013


At Wed, 12 Jun 2013 22:01:21 +0800,
Kai Zhang wrote:
> 
> On Jun 12, 2013, at 7:35 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:
> 
> > Well, why is it better?  Is it easier to rebase my patch onto yours?
> > My patch fixes a critical problem that epoch information will be
> > corrupted after session timeout.  Currently, there is no way to fix
> > the broken epoch other than re-formatting the cluster.
> > 
> 
> Sorry, I didn't get that this path will fix the epoch broken issue.
> What I said "better" is based on this patch does not fix any critical issue.
> Could you explain some details on this? 
> And I guess this is caused by the concurrent start up. But I'm not quite sure.

On the second thought, this patch doesn't seem to break epoch
directly.  However, without this patch, zk_queue_peek() can return
false wrongly when the session timeout happens and sheep can start a
block operation while another block operation is ongoing.  This might
be the reason I saw a epoch corruption, but I'm not sure now.

Anyway, my patch is smaller and simpler than yours and it's easier for
me to rebase mine onto yours.  I'll update my patch after merging your
series.

Thanks,

Kazutaka



More information about the sheepdog mailing list