[sheepdog] [PATCH RFC 2/2] collie: add a new subcommand "recovery-progress" to node

Liu Yuan namei.unix at gmail.com
Mon Jul 29 10:13:27 CEST 2013


On Mon, Jul 29, 2013 at 04:39:27PM +0900, Hitoshi Mitake wrote:
> This patch adds a new subcommand recovery-progress to node. With this
> subcommand, users can show a progress of recovery process.
> 
> $ sudo collie node recovery-progress
>  99.7 % [==============================================>] 7047 / 7068
> recovery process ends
> 
> The denominator (7068 in the above case) indicates a number of entire
> object which should be checked. The numerator (7047 in the above case)
> indicates a number of objects which is already checked or copied.
> 
> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> ---
>  collie/node.c |   82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/collie/node.c b/collie/node.c
> index 0cd7e7a..2019c3e 100644
> --- a/collie/node.c
> +++ b/collie/node.c
> @@ -120,6 +120,84 @@ static int node_info(int argc, char **argv)
>  	return EXIT_SUCCESS;
>  }
>  
> +/*
> + * recovery_progress_unit()
> + *
> + * Obtain recovery progress information and return true if the recovery process
> + * ends.
> + */
> +static bool recovery_progress_unit(struct recovery_progress *prog)
> +{
> +	int ret;
> +	bool res = false;

what does res mean? We mostly use 'ret' to mean 'return value' conventionally.

And I think get_recovery_info() is a better name.

> +	struct sd_req req;
> +
> +	sd_init_req(&req, SD_OP_STAT_RECOVERY);
> +	req.data_length = sizeof(*prog);
> +
> +	ret = collie_exec_req(sdhost, sdport, &req, prog);
> +	switch (ret) {
> +	case SD_RES_SUCCESS:
> +		res = true;
> +		break;
> +	case SD_RES_NODE_IN_RECOVERY:
> +		break;
> +	default:
> +		fprintf(stderr, "obtaining recovery progress fail: %s\n",
> +			sd_strerror(ret));
> +		res = true;
> +		break;
> +	}
> +
> +	return res;
> +}

Put case handlings all in the recovery_progress_unit, then you don't need first
calling recovery_progress_unit outside while loop.

while (true) {
	if (!get_reocvery_info(&info))
		break;
	switch (info.state) {
	}
	sleep;
}

>
> +static int node_recovery_progress(int argc, char **argv)
> +{
> +	struct recovery_progress prog;
> +	bool end;
> +
> +	/*
> +	 * ToDos
> +	 *
> +	 * 1. Calculate size of actually copied objects.
> +	 *    For doing this, not so trivial changes for recovery process is
> +	 *    required.
> +	 *
> +	 * 2. Print remaining physical time.
> +	 *    Even if it is not so acculate, it is helpful for administrators.
> +	 */
> +	end = recovery_progress_unit(&prog);
> +	if (end) {
> +		printf("node %s:%d isn't doing recovery\n", sdhost, sdport);
> +		return EXIT_SUCCESS;
> +	}
> +
> +	do {
> +		end = recovery_progress_unit(&prog);
> +		if (end)
> +			break;
> +
> +		switch (prog.state) {
> +		case RW_PREPARE_LIST:
> +			printf("\rpreparing a checked object list...");
> +			break;
> +		case RW_NOTIFY_COMPLETION:
> +			printf("\rnotifying a completion of recovery...");
> +			break;
> +		case RW_RECOVER_OBJ:
> +			show_progress(prog.nr_recovered_objects,
> +				prog.nr_entire_checked_objects, true);
> +			break;
> +		}
> +
> +		sleep(1);

Since recovery object is time consuming process and IO bound, so sleep more
time is better.

> +	} while (true);
> +
> +	printf("recovery process ends\n");

When collie returns, it already indicates the operation is done. So this printf
isn't necessary.

Thanks
Yuan



More information about the sheepdog mailing list