[sheepdog] [PATCH v5 1/2] sheep: enhance STAT_RECOVERY for prividing information of recovery state

Hitoshi Mitake mitake.hitoshi at gmail.com
Mon Aug 5 06:05:32 CEST 2013


At Mon, 5 Aug 2013 10:53:19 +0800,
Liu Yuan wrote:
> 
> On Mon, Aug 05, 2013 at 10:16:27AM +0900, Hitoshi Mitake wrote:
> > At Sun, 4 Aug 2013 22:47:21 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Sun, Aug 04, 2013 at 05:30:21PM +0900, Hitoshi Mitake wrote:
> > > > From: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > 
> > > > Current SD_OP_STAT_RECOVERY only provides an information that
> > > > indicates the node is doing recovery or not. For providing more useful
> > > > information about progress of recovery, this patch enhances this
> > > > request.
> > > > 
> > > > The enhanced SD_OP_STAT_RECOVERY returns the information with struct
> > > > recovery_state. The struct contains these information:
> > > > 1. a flag which indicates a sheep process is in progress or not
> > > > 2. state of recovery. With this information, collie can know which
> > > >    phase of progress sheep is in.
> > > > 3. Number of copied object during a recovery process.
> > > > 4. Number of entire objects which should be copied during a recovery
> > > >    process.
> > > > 
> > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > ---
> > > > v4:
> > > >  - change type of recovery_state->in_progress from uint32_t to uint8_t
> > > > 
> > > > v3:
> > > >  - clean coding style
> > > > 
> > > > v2:
> > > >  - make this feature as an option of "node recovery", not a new subcommand
> > > >  - clean coding style
> > > >  -- renaming recovery_progress_unit() -> get_recovery_progress()
> > > > 
> > > >  collie/node.c            |    8 +++++---
> > > >  include/internal_proto.h |   13 +++++++++++++
> > > >  sheep/ops.c              |    4 ++--
> > > >  sheep/recovery.c         |   22 ++++++++++++++++------
> > > >  sheep/sheep_priv.h       |    1 +
> > > >  5 files changed, 37 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/collie/node.c b/collie/node.c
> > > > index 69229f4..4230af5 100644
> > > > --- a/collie/node.c
> > > > +++ b/collie/node.c
> > > > @@ -132,17 +132,19 @@ static int node_recovery(int argc, char **argv)
> > > >  	for (i = 0; i < sd_nodes_nr; i++) {
> > > >  		char host[128];
> > > >  		struct sd_req req;
> > > > -		struct sd_rsp *rsp = (struct sd_rsp *)&req;
> > > > +		struct recovery_state state;
> > > >  
> > > > +		memset(&state, 0, sizeof(state));
> > > >  		addr_to_str(host, sizeof(host), sd_nodes[i].nid.addr, 0);
> > > >  
> > > >  		sd_init_req(&req, SD_OP_STAT_RECOVERY);
> > > > +		req.data_length = sizeof(state);
> > > >  
> > > > -		ret = collie_exec_req(host, sd_nodes[i].nid.port, &req, NULL);
> > > > +		ret = collie_exec_req(host, sd_nodes[i].nid.port, &req, &state);
> > > >  		if (ret < 0)
> > > >  			return EXIT_SYSFAIL;
> > > >  
> > > > -		if (rsp->result == SD_RES_NODE_IN_RECOVERY) {
> > > > +		if (state.in_recovery) {
> > > >  			addr_to_str(host, sizeof(host),
> > > >  					sd_nodes[i].nid.addr, sd_nodes[i].nid.port);
> > > >  			printf(raw_output ? "%d %s %d %d\n" : "%4d   %-20s%5d%11d\n",
> > > > diff --git a/include/internal_proto.h b/include/internal_proto.h
> > > > index 0061007..e1424f2 100644
> > > > --- a/include/internal_proto.h
> > > > +++ b/include/internal_proto.h
> > > > @@ -192,4 +192,17 @@ static inline __attribute__((used)) void __sd_epoch_format_build_bug_ons(void)
> > > >  	BUILD_BUG_ON(sizeof(struct sd_node) != SD_NODE_SIZE);
> > > >  }
> > > >  
> > > > +enum rw_state {
> > > > +	RW_PREPARE_LIST, /* the recovery thread is preparing object list */
> > > > +	RW_RECOVER_OBJ, /* the thread is recoering objects */
> > > > +	RW_NOTIFY_COMPLETION, /* the thread is notifying recovery completion */
> > > > +};
> > > > +
> > > > +struct recovery_state {
> > > > +	uint8_t in_recovery;
> > > > +	enum rw_state state;
> > > > +	uint32_t nr_finished;
> > > > +	uint32_t nr_total;
> > > 
> > > I think we should set uint64_t for both nr_finished and nr_total;
> > 
> > These variables are corresponding to done and count of
> > recovery_info. These are typed as int and uint32_t. So I think
> > nr_finished and nr_total shouldn't be typed as uint64_t.
> 
> This means we can only support up to max of uint32_t objects. This is bug, fix
> fix it and then change nr_xxx to type uint64_t.

OK, I'll prepare a new patch for solving this problem in the next version.

Thanks,
Hitoshi



More information about the sheepdog mailing list