[sheepdog] [PATCH] zookeeper: check queue_pos znode before join cluster

Saeki Masaki saeki.masaki at po.ntts.co.jp
Thu Mar 19 12:11:35 CET 2015


On 2015/03/19 0:08, Meng Lingkun wrote:
> Startup sheep immediately after shutdown makes dog node list error.
> The bug can be find on [1]. Check queue_pos znode before join the
> cluster just like member znode does.
> [1]https://bugs.launchpad.net/sheepdog-project/%20bug/1433452
>
> Signed-off-by: Meng Lingkun <menglingkun at cmss.chinamobile.com>
> ---
>   sheep/cluster/zookeeper.c |   11 ++++++++---
>   1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
> index 303449e..690f054 100644
> --- a/sheep/cluster/zookeeper.c
> +++ b/sheep/cluster/zookeeper.c
> @@ -1004,14 +1004,19 @@ out_unlock:
>   static int zk_join(const struct sd_node *myself,
>   		   void *opaque, size_t opaque_len)
>   {
> -	int rc;
> +	int rc1, rc2;
>   	char path[MAX_NODE_STR_LEN];
>
>   	this_node.node = *myself;
>
>   	snprintf(path, sizeof(path), MEMBER_ZNODE "/%s", node_to_str(myself));
> -	rc = zk_node_exists(path);
> -	if (rc == ZOK) {
> +	rc1 = zk_node_exists(path);
> +
> +	snprintf(path, sizeof(path), QUEUE_POS_ZNODE "/%s",
> +		node_to_str(myself));
> +	rc2 = zk_node_exists(path);
> +
> +	if (rc1 == ZOK || rc2 == ZOK) {
>   		sd_err("Previous zookeeper session exist, shoot myself. Please "
>   			"wait for %d seconds to join me again.",
>   			DIV_ROUND_UP(zk_timeout, 1000));
>

Thank you, Meng.

I've tested to reproduce launchpad situation,
it works good.
( no longer to become inconsistent )

Tested-by: Masaki Saeki <saeki.masaki at po.ntts.co.jp>

Regards, Saeki.





More information about the sheepdog mailing list