[sheepdog] [PATCH] sheep: don't clean stale dir if there are no enough nodes

Wed Dec 10 13:33:34 CET 2014

At Fri, 5 Dec 2014 12:05:26 +0800 (GMT+08:00),
Yang Zhang wrote:
> 
> Hi Hitoshi,
> 
> I've test the patch. It didn't solve the problem, 'dog vdi list' still show object not found. 
> Actually. it didn't clean the object saved in .stale dir, but didn't recover it back to obj/ also.

Yang, long,

Thanks for your testing. It worked on my environment well, so I think
we did different testing. I'll share my testing method later.

> 
> Alsoï¼Œ i wonder even if we recover the obj in.stale dir, will it be the newest version?

Yes, the problem remains. Current behavior of sheepdog is odd. In a
case of nr_zones < maximum nr copies, it should stop with the status
SD_STATUS_WAIT like initialization sequence. In addition, if the
cluster is SD_STATUS_OK already, newly joining node shouldn't provide
its object for recovery process. All objects are replicated in
existing nodes correctly.

BTW, how do you think about this idea: simply killing gateway nodes
when an epoch is becoming gateway only. It will simply solve the
problem. And it doesn't hurt VMs because QEMU (and tgt) already have
reconnection feature. Gateway only cluster doesn't contribute to read
and write, so simply stopping it seems reasonable idea to me.

My company doesn't use the gateway feature, so I'd like to hear your
opinion.

Thanks,
Hitoshi

> 
> Thanks,
> Yang
> 
> > -----åŽŸå§‹é‚®ä»¶-----
> > å‘ä»¶äºº: "Hitoshi Mitake" <mitake.hitoshi at lab.ntt.co.jp>
> > å‘é€æ—¶é—´: 2014å¹´12æœˆ4æ—¥ æ˜ŸæœŸå›> > æ”¶ä»¶äºº: sheepdog at lists.wpkg.org
> > æŠ„é€: mitake.hitoshi at gmail.com, "Hitoshi Mitake" <mitake.hitoshi at lab.ntt.co.jp>, duron800 at qq.com, "å¼ æ‰¬" <3100100878 at zju.edu.cn>, "å¾å°ï¿½éœœ" <nxtxiaolong at gmail.com>
> > ä¸»é¢˜: Re: [PATCH] sheep: don't clean stale dir if there are no enough nodes
> > 
> > At Thu,  4 Dec 2014 16:05:39 +0900,
> > Hitoshi Mitake wrote:
> > > 
> > > Current recovery process has a bug of data wipe. After an epoch which
> > > consists only gateway nodes, objects stored in dying nodes will be
> > > wiped when the nodes join to the cluster. This patch solves the
> > > problem with removing invalid call of sd_store->cleanup() during
> > > recovery completion.
> > > 
> > > Related issue:
> > > https://bugs.launchpad.net/sheepdog-project/+bug/1327037
> > > 
> > > Cc: duron800 at qq.com
> > > Cc: ï¿½UEQo <3100100878 at zju.edu.cn>
> > > Cc: å¾å°ï¿½éœœ <nxtxiaolong at gmail.com>
> > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > ---
> > >  sheep/ops.c        |  5 +++--
> > >  sheep/sheep_priv.h |  1 +
> > >  sheep/vdi.c        | 12 ++++++++++++
> > >  3 files changed, 16 insertions(+), 2 deletions(-)
> > 
> > ï¿½UEQo, å¾å°ï¿½éœœ, could you test this patch if you have time? It would be
> > the simplest solution for the problem.
> > 
> > Thanks,
> > Hitoshi
> > 
> > > 
> > > diff --git a/sheep/ops.c b/sheep/ops.c
> > > index a617a83..b418bda 100644
> > > --- a/sheep/ops.c
> > > +++ b/sheep/ops.c
> > > @@ -726,8 +726,9 @@ static int cluster_recovery_completion(const struct sd_req *req,
> > >  			sd_notice("all nodes are recovered, epoch %d", epoch);
> > >  			last_gathered_epoch = epoch;
> > >  			/* sd_store can be NULL if this node is a gateway */
> > > -			if (vnode_info->nr_zones >= ec_max_data_strip &&
> > > -			    sd_store && sd_store->cleanup)
> > > +			if (vnode_info->nr_zones >=
> > > +			    max(ec_max_data_strip, max_nr_copies)
> > > +			    && sd_store && sd_store->cleanup)
> > >  				sd_store->cleanup();
> > >  		}
> > >  	}
> > > diff --git a/sheep/sheep_priv.h b/sheep/sheep_priv.h
> > > index 5fc6b90..699f352 100644
> > > --- a/sheep/sheep_priv.h
> > > +++ b/sheep/sheep_priv.h
> > > @@ -357,6 +357,7 @@ int inode_coherence_update(uint32_t vid, bool validate,
> > >  void remove_node_from_participants(const struct node_id *left);
> > >  
> > >  extern int ec_max_data_strip;
> > > +extern int max_nr_copies;
> > >  
> > >  int read_vdis(char *data, int len, unsigned int *rsp_len);
> > >  int read_del_vdis(char *data, int len, unsigned int *rsp_len);
> > > diff --git a/sheep/vdi.c b/sheep/vdi.c
> > > index 1c8fb36..d815196 100644
> > > --- a/sheep/vdi.c
> > > +++ b/sheep/vdi.c
> > > @@ -40,6 +40,12 @@ static struct sd_rw_lock vdi_state_lock = SD_RW_LOCK_INITIALIZER;
> > >   */
> > >  int ec_max_data_strip;
> > >  
> > > +/*
> > > + * max_nr_copies represent max number of copies of replicated VDIs. It is used
> > > + * for the same purpose of ec_max_data_strip.
> > > + */
> > > +int max_nr_copies;
> > > +
> > >  int sheep_bnode_writer(uint64_t oid, void *mem, unsigned int len,
> > >  		       uint64_t offset, uint32_t flags, int copies,
> > >  		       int copy_policy, bool create, bool direct)
> > > @@ -171,6 +177,12 @@ int add_vdi_state(uint32_t vid, int nr_copies, bool snapshot, uint8_t cp)
> > >  		sd_mutex_lock(&m);
> > >  		ec_max_data_strip = max(d, ec_max_data_strip);
> > >  		sd_mutex_unlock(&m);
> > > +	} else {
> > > +		static struct sd_mutex m = SD_MUTEX_INITIALIZER;
> > > +
> > > +		sd_mutex_lock(&m);
> > > +		max_nr_copies = max(nr_copies, max_nr_copies);
> > > +		sd_mutex_unlock(&m);
> > >  	}
> > >  
> > >  	sd_debug("%" PRIx32 ", %d, %d", vid, nr_copies, cp);
> > > -- 
> > > 1.8.3.2
> > > 
>