[sheepdog] [PATCH] start traversing from a random node in fill_obj_list()
Liu Yuan
namei.unix at gmail.com
Tue May 22 04:49:40 CEST 2012
On 05/21/2012 12:11 PM, levin li wrote:
> From: levin li <xingke.lwp at taobao.com>
>
> Every node has the same sd_node order in its epoch, so in
> fill_obj_list(), every node starts from a same node to request
> the object list, which may cause the node overload.
>
> Indeed, we meet this problem when there's 960 nodes in our
> cluster, when in the period of fill_obj_list, some node get
> 'too many requests' in client_rx_handler(), so I change it
> to start from a random node in fill_obj_list() to make load blance.
>
> Signed-off-by: levin li <xingke.lwp at taobao.com>
> ---
> sheep/recovery.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/sheep/recovery.c b/sheep/recovery.c
> index e31f226..55bb122 100644
> --- a/sheep/recovery.c
> +++ b/sheep/recovery.c
> @@ -699,6 +699,8 @@ static int fill_obj_list(struct recovery_work *rw)
> int retry_cnt;
> struct sd_node *cur = rw->cur_nodes;
> int cur_nr = rw->cur_nr_nodes;
> + int start = random() % cur_nr;
> + int end = cur_nr;
>
> buf = malloc(buf_size);
> if (!buf) {
> @@ -706,7 +708,9 @@ static int fill_obj_list(struct recovery_work *rw)
> rw->retry = 1;
> return -1;
> }
> - for (i = 0; i < cur_nr; i++) {
> +
> +again:
> + for (i = start; i < end; i++) {
> int buf_nr;
> struct sd_node *node = cur + i;
>
> @@ -738,6 +742,12 @@ static int fill_obj_list(struct recovery_work *rw)
> rw->count = merge_objlist(rw->oids, rw->count, (uint64_t *)buf, buf_nr);
> }
>
> + if (start != 0 && !next_rw) {
> + end = start;
> + start = 0;
> + goto again;
> + }
> +
> dprintf("%d\n", rw->count);
> free(buf);
> return 0;
Applied, thanks.
Yuan
More information about the sheepdog
mailing list