[sheepdog] [PATCH v2 2/2] collie: add a new option --progress to "node recovery" for showing recovery progress

Hitoshi Mitake mitake.hitoshi at gmail.com
Fri Aug 2 09:15:19 CEST 2013


On Thu, Aug 1, 2013 at 1:42 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On Thu, Aug 01, 2013 at 01:03:09PM +0900, Hitoshi Mitake wrote:
>> This patch adds a new option --progress (or -P) to the node recovery
>> subcommand. With this subcommand, users can show a progress of
>> recovery process.
>>
>> Example:
>> $ sudo collie node recovery --progress
>>  99.7 % [==============================================>] 7047 / 7068
>>
>> The denominator (7068 in the above case) indicates a number of entire
>> object which should be checked. The numerator (7047 in the above case)
>> indicates a number of objects which is already checked or copied.
>>
>> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
>> ---
>> v2:
>>  - make this feature as an option of "node recovery", not a new subcommand
>>  - clean coding style
>>  -- renaming recovery_progress_unit() -> get_recovery_progress()
>>
>>  collie/node.c |  110 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 108 insertions(+), 2 deletions(-)
>>
>> diff --git a/collie/node.c b/collie/node.c
>> index 69229f4..a1392b0 100644
>> --- a/collie/node.c
>> +++ b/collie/node.c
>> @@ -13,6 +13,7 @@
>>
>>  static struct node_cmd_data {
>>       bool all_nodes;
>> +     bool recovery_progress;
>>  } node_cmd_data;
>>
>>  static void cal_total_vdi_size(uint32_t vid, const char *name, const char *tag,
>> @@ -120,10 +121,111 @@ static int node_info(int argc, char **argv)
>>       return EXIT_SUCCESS;
>>  }
>>
>> +/*
>> + * get_recovery_progress()
>> + *
>> + * Returned values:
>> + * -1 ... request failed
>> + *  0 ... recovery ended
>> + *  1 ... recovery is continuing
>> + */
>> +static bool get_recovery_progress(struct recovery_progress *prog)
>> +{
>
> bool means true or false.

Sorry for that... I forgot to change the type of the return value.

>
>> +     int ret;
>> +     struct sd_req req;
>> +     struct sd_rsp *rsp = (struct sd_rsp *)&req;
>> +
>> +     sd_init_req(&req, SD_OP_STAT_RECOVERY);
>> +     req.data_length = sizeof(*prog);
>> +
>> +     ret = collie_exec_req(sdhost, sdport, &req, prog);
>> +     if (ret < 0) {
>> +             fprintf(stderr, "Failed to execute request\n");
>> +             ret = -1;
>> +             goto out;
>> +     }
>> +
>> +     switch (rsp->result) {
>> +     case SD_RES_SUCCESS:
>> +             ret = 0;
>> +             break;
>> +     case SD_RES_NODE_IN_RECOVERY:
>> +             ret = 1;
>> +             break;
>> +     default:
>> +             fprintf(stderr, "obtaining recovery progress fail: %s\n",
>> +                     sd_strerror(ret));
>> +             ret = -1;
>> +             break;
>> +     }
>> +
>> +out:
>> +     return ret;
>> +}
>> +
>> +static int node_recovery_progress(void)
>> +{
>> +     int status, prev_status = -2;
>> +
>> +     /*
>> +      * prev_status is required for expressing state transition, and -2
>> +      * indicates the previous state is not initialized
>> +      */
>> +
>> +     /*
>> +      * ToDos
>> +      *
>> +      * 1. Calculate size of actually copied objects.
>> +      *    For doing this, not so trivial changes for recovery process are
>> +      *    required.
>> +      *
>> +      * 2. Print remaining physical time.
>> +      *    Even if it is not so acculate, it is helpful for administrators.
>> +      */
>> +
>> +     do {
>> +             struct recovery_progress prog;
>> +
>> +             status = get_recovery_progress(&prog);
>> +             if (status != 1) {
>> +                     if (status == 0 && prev_status != -2)
>> +                             /* not an immediate completion */
>> +                             show_progress(prog.nr_total, prog.nr_total,
>> +                                     true);
>> +
>> +                     break;
>> +             }
>> +
>
> I can't understand what you are doing here. Just have get_recovery_progress()
> return true:node_in_recovery, false:node_not_in_recovery, isn't enough?

If we do so, we have to call get_recovery_progress() outside of the
loop at least once.
Because we cannot distinguish the below two cases without difference
between two continuing recovery state:
1. sheep isn't doing recovery from first
2. sheep finishes recovery

We need different output for the above two cases. So we need the
variable prev_status.
But the -2 is needless in my latest patchset. I'll remove it in the
next version.

Thanks,
Hitoshi



More information about the sheepdog mailing list