[sheepdog] [PATCH] sheep: fix incorrect status transition during joining cluster
Hitoshi Mitake
mitake.hitoshi at gmail.com
Mon Jun 10 06:02:26 CEST 2013
At Sun, 9 Jun 2013 02:18:28 +0900,
Hitoshi Mitake wrote:
>
> Current sheep set jm->cluster_status SD_STATUS_OK if
> 1. itself is a first node to join the cluster, and
> 2. a number of nodes of latest epoch is 1
>
> This is an invalid behavior. For example, the behavior allows such a
> situation:
> 1. create a cluster with 2 nodes
> 2. format the cluster with --copies 2
> 3. kill single sheep
> 4. shutdown the cluster
> 5. launch single sheep
> 6. the status of the cluster is SD_STATUS_OK
>
> This patch solves the problem. The second condition is now checked
> with have_enough_zones().
>
> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> ---
> sheep/group.c | 10 ++--------
> 1 files changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/sheep/group.c b/sheep/group.c
> index f74ef10..48040bf 100644
> --- a/sheep/group.c
> +++ b/sheep/group.c
> @@ -994,12 +994,10 @@ enum cluster_join_result sd_check_join_cb(const struct sd_node *joining,
> }
>
> if (node_is_local(joining)) {
> - struct sd_node entries[SD_MAX_NODES];
> - int nr_entries;
> uint32_t epoch;
>
> /*
> - * If I'm the first sheep joins in corosync, I
> + * If I'm the first sheep joins the cluster, I
> * becomes the master without sending JOIN.
> */
>
> @@ -1017,14 +1015,10 @@ enum cluster_join_result sd_check_join_cb(const struct sd_node *joining,
> return CJ_RES_FAIL;
> }
>
> - nr_entries = epoch_log_read(epoch, entries, sizeof(entries));
> - if (nr_entries == -1)
> - return CJ_RES_FAIL;
> -
> sys->epoch = epoch;
> jm->ctime = get_cluster_ctime();
>
> - if (nr_entries == 1)
> + if (have_enough_zones())
> jm->cluster_status = SD_STATUS_OK;
> return CJ_RES_SUCCESS;
> }
Sorry, this have_enough_zones() is invalid because the sheep doesn't
have current_vnode_info at that point.
I'll fix the problem and send v2 later.
Thanks,
Hitoshi
More information about the sheepdog
mailing list