[sheepdog] [PATCH] zookeeper: hande node joining race

Christoph Hellwig hch at infradead.org
Tue Jun 5 14:17:55 CEST 2012


So what is the conclusion here?  Even if the driver is not designed
for fast joining I really think the current failure mode is the worst
possible one.

Can I expect a resend to get applied?  Should I make any changes?

On Mon, May 28, 2012 at 11:25:24AM -0400, Christoph Hellwig wrote:
> I got a bug report where the nr_sd_nodes == nr_zk_nodes assert in
> build_node_list is trigger by a larger number of sheep joining at the same
> time.
> 
> >From code inspection it seems like this case can be triggered if
> node_btree_add is called twice for the same node, and tsearch returns
> the existing node, but nr_zk_nodes is still incremented.
> 
> It seems like node_btree_add could be called twice if the startup of
> the zookeeper driver hits a race between reading out the list of existing
> nodes and new events.
> 
> The patch below fixes the problem by handling this case in node_btree_add
> and asserting it only happens for valid case.  I'm not overly happy with
> this fix, fixes based on better knowledge of zookeeper would be welcome.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> 
> diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
> index 859f2b0..a09d55a 100644
> --- a/sheep/cluster/zookeeper.c
> +++ b/sheep/cluster/zookeeper.c
> @@ -330,7 +330,7 @@ static inline int zk_node_cmp(const void *a, const void *b)
>  	return node_cmp(&znode1->node, &znode2->node);
>  }
>  
> -static void node_btree_add(void **btroot, struct zk_node *znode)
> +static int node_btree_add(void **btroot, struct zk_node *znode)
>  {
>  	struct zk_node *n, **p;
>  
> @@ -343,11 +343,14 @@ static void node_btree_add(void **btroot, struct zk_node *znode)
>  	p = (struct zk_node **)tsearch((void *)n, btroot, zk_node_cmp);
>  	if (p == NULL)
>  		panic("tsearch, oom\n");
> -	else if (*p != n) {
> +
> +	if (*p != n) {
>  		**p = *n;
>  		free(n);
> +		return -1;
>  	}
>  	nr_zk_nodes++;
> +	return 0;
>  }
>  
>  static inline void node_btree_del(void **btroot, struct zk_node *znode)
> @@ -466,7 +469,10 @@ static void zk_member_init(zhandle_t *zh)
>  
>  			switch (rc) {
>  			case ZOK:
> -				node_btree_add(&zk_node_btroot, &znode);
> +				rc = node_btree_add(&zk_node_btroot, &znode);
> +				if (rc)
> +					panic("node %s already exists\n", path);
> +				break;
>  			case ZNONODE:
>  				break;
>  			default:
> @@ -770,7 +776,13 @@ static void zk_handler(int listen_fd, int events, void *data)
>  			 */
>  			node_btree_clear(&zk_node_btroot);
>  
> -		node_btree_add(&zk_node_btroot, &ev.sender);
> +		rc = node_btree_add(&zk_node_btroot, &ev.sender);
> +		if (rc) {
> +			if (node_eq(&ev.sender.node, &this_node.node))
> +				panic("this_node already in btree\n");
> +			dprintf("node %s already existed\n",
> +				node_to_str(&ev.sender.node));
> +		}
>  		dprintf("one sheep joined[down], nr_nodes:%ld, sender:%s, joined:%d\n",
>  				nr_zk_nodes, node_to_str(&ev.sender.node), ev.sender.joined);
>  
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
---end quoted text---



More information about the sheepdog mailing list