[Sheepdog] [PATCH v2 3/3] sheep: use SD_STATUS_HALT to stop serving IO
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Wed Oct 12 11:12:46 CEST 2011
At Tue, 11 Oct 2011 17:27:13 +0800,
Liu Yuan wrote:
>
> From: Liu Yuan <tailai.ly at taobao.com>
>
> We use SD_STATUS_HALT to identify the cluster state when it should not serve
> IO requests.
>
> [Test Case]
>
> steps:
>
> for i in 0 1 2 3; do ./sheep/sheep -d /store/$i -z $i -p 700$i; sleep 1; done
> ./collie/collie cluster format --copies=3;
> for i in 0 1; do pkill -f "sheep -d /store/$i"; sleep 1; done
> for i in 2 3; do ./collie/collie cluster info -p 700$i; done
> for i in 0 1; do ./sheep/sheep -d /store/$i -z $i -p 700$i; sleep 1; done
> for i in 0 1 2 3; do ./collie/collie cluster info -p 700$i; done
>
> output:
>
> Cluster status: The node is stopped doing IO, short of living nodes
>
> Creation time Epoch Nodes
> 2011-10-11 16:26:02 3 [192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 2 [192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 1 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
> Cluster status: The node is stopped doing IO, short of living nodes
>
> Creation time Epoch Nodes
> 2011-10-11 16:26:02 3 [192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 2 [192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 1 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
> Cluster status: running
>
> Creation time Epoch Nodes
> 2011-10-11 16:26:02 5 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 4 [192.168.0.1:7000, 192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 3 [192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 2 [192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
> 2011-10-11 16:26:02 1 [192.168.0.1:7000, 192.168.0.1:7001, 192.168.0.1:7002, 192.168.0.1:7003]
>
> ...
>
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
> sheep/group.c | 14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
The following test doesn't work in my environment:
$ for i in 0 1; do sheep /store/$i -z $i -p 700$i;sleep 1;done
$ collie cluster format
$ for i in 0; do pkill -f "sheep /store/$i"; sleep 1; done
$ for i in 2; do sheep /store/$i -z $i -p 700$i;sleep 1;done
$ for i in 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
$ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i;sleep 1;done
$ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i;sleep 1;done
$ for i in 0 1 2; do collie cluster info -p 700$i;done
Cluster status: running
Creation time Epoch Nodes
2011-10-12 17:56:12 4 [10.68.14.1:7000, 10.68.14.1:7001]
2011-10-12 17:56:12 3 [10.68.14.1:7001]
2011-10-12 17:56:12 2 [10.68.14.1:7001]
2011-10-12 17:56:12 1 [10.68.14.1:7000, 10.68.14.1:7001]
Cluster status: running
Creation time Epoch Nodes
2011-10-12 17:56:12 4 [10.68.14.1:7000, 10.68.14.1:7001]
2011-10-12 17:56:12 3 [10.68.14.1:7001, 10.68.14.1:7002]
2011-10-12 17:56:12 2 [10.68.14.1:7001]
2011-10-12 17:56:12 1 [10.68.14.1:7000, 10.68.14.1:7001]
failed to connect to localhost:7002, Connection refused
localhost:7002 seems to have a wrong creation time. Perhaps, the
master multicasts a wrong join_message when its state is
SD_STATUS_HALT?
Thanks,
Kazutaka
>
> diff --git a/sheep/group.c b/sheep/group.c
> index 2871e97..756f8a6 100644
> --- a/sheep/group.c
> +++ b/sheep/group.c
> @@ -1212,6 +1212,13 @@ static void __sd_notify_done(struct cpg_event *cevent)
> }
> start_recovery(sys->epoch);
> }
> +
> + if (sys->status == SD_STATUS_HALT) {
> + int nr_zones = get_zones_nr_from(&sys->sd_node_list);
> +
> + if (nr_zones >= sys->nr_sobjs)
> + sys->status = SD_STATUS_OK;
> + }
> }
>
> static void sd_notify_handler(struct sheepid *sender, void *msg, size_t msg_len)
> @@ -1451,6 +1458,13 @@ static void __sd_leave_done(struct cpg_event *cevent)
> if (node_left &&
> (sys->status == SD_STATUS_OK || sys->status == SD_STATUS_HALT))
> start_recovery(sys->epoch);
> +
> + if (sys->status == SD_STATUS_OK) {
> + int nr_zones = get_zones_nr_from(&sys->sd_node_list);
> +
> + if (nr_zones < sys->nr_sobjs)
> + sys->status = SD_STATUS_HALT;
> + }
> }
>
> static void cpg_event_free(struct cpg_event *cevent)
> --
> 1.7.6.1
>
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
More information about the sheepdog
mailing list