[sheepdog] [PATCH 0/3] zookeeper: fix error handling

Wed May 29 14:38:39 CEST 2013

On May 29, 2013, at 6:37 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:

> From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> 
> The first two patches fix problems which happens under heavy network
> traffic.  The third patch is a clean-up one.
> 
> MORITA Kazutaka (3):
>  zookeeper: retry zk_create_seq_node on retryable error
>  zookeeper: use panic instead of assert for error handling
>  zookeeper: use offsetof to calculate offset
> 
> sheep/cluster/zookeeper.c |   88 +++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 77 insertions(+), 11 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog

Hi MORITA

Very happy to see this patch.
Actually, I'm also working on this and have the same idea on how to handle create sequential node failure (I use uuid_generate() instead).

However, there is still one unsolved problem before I submit my patch.
That is zookeeper cannot handle session timeout.
In that situation, the ephemeral node will be deleted by zookeeper server and other sd nodes will assume the node has left.

Is there a way that sheep can rejoin cluster other than panic?
Because currently sheep panic will cause a qemu restart which should be avoided in production environment.

Thanks,
Kyle