[sheepdog] [PATCH v3 5/5] zookeeper: handle session timeout for all zookeeper operations

MORITA Kazutaka morita.kazutaka at gmail.com
Mon Jun 17 18:22:13 CEST 2013


At Mon, 17 Jun 2013 05:28:46 -0700,
Kai Zhang wrote:
> 
> The idea is: when a zk_* APIs returns ZK_INVALIDSTATE, it means the connection
> and session to zookeeper have been lost.
> At this point, callers of zk_* APIs should just do nothing but drop control as
> soon as possible.
> And another thread will responsable for cleaning memory state, re-connecting
> to zookeeper and re-sending join request.
> 
> Signed-off-by: Kai Zhang <kyle at zelin.io>
> ---
>  sheep/cluster/zookeeper.c |  245 ++++++++++++++++++++++++++++++---------------
>  1 file changed, 163 insertions(+), 82 deletions(-)
> 
> diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
> index ca113dc..eec1e2e 100644
> --- a/sheep/cluster/zookeeper.c
> +++ b/sheep/cluster/zookeeper.c
> @@ -33,8 +33,7 @@
>  
>  /* iterate child znodes */
>  #define FOR_EACH_ZNODE(parent, path, strs)			       \
> -	for (zk_get_children(parent, strs),			       \
> -		     (strs)->data += (strs)->count;		       \
> +	for ((strs)->data += (strs)->count;			       \
>  	     (strs)->count-- ?					       \
>  		     snprintf(path, sizeof(path), "%s/%s", parent,     \
>  			      *--(strs)->data) : (free((strs)->data), 0); \
> @@ -76,6 +75,7 @@ static LIST_HEAD(zk_block_list);
>  static uatomic_bool is_master;
>  static uatomic_bool stop;
>  static bool first_push = true;
> +static uint64_t zk_flying_ops;
>  
>  static void zk_compete_master(void);
>  
> @@ -140,28 +140,39 @@ static inline struct zk_node *zk_tree_search(const struct node_id *nid)
>  static zhandle_t *zhandle;
>  static struct zk_node this_node;
>  
> +#define check_zk_rc(rc, path)						\
> +	if (rc != ZOK && rc != ZNONODE && rc != ZNODEEXISTS &&		\
> +	    rc != ZINVALIDSTATE)					\
> +		panic("failed, path:%s, %s", path, zerror(rc));
> +

On my environment, zoo_exist() can return ZSESSIONEXPIRED, which
should pass this check I think.

> @@ -191,11 +204,14 @@ zk_create_seq_node(const char *path, const char *value, int valuelen,
>  		   char *path_buffer, int path_buffer_len)
>  {
>  	int rc;
> +	uatomic_inc(&zk_flying_ops);
>  	rc = zoo_create(zhandle, path, value, valuelen, &ZOO_OPEN_ACL_UNSAFE,
>  			ZOO_SEQUENCE, path_buffer, path_buffer_len);
> +	uatomic_dec(&zk_flying_ops);
> +	check_zk_rc(rc, path);
> +

This causes panic when rc is ZOPERATIONTIMEOUT or ZCONNECTIONLOSS.

Thanks,

Kazutaka



More information about the sheepdog mailing list