[sheepdog] [PATCH 6/6] sheep: clear bit in vdi_deleted if vdi is created

Tue Mar 17 07:01:43 CET 2015

On Tue, Mar 17, 2015 at 02:51:32PM +0900, Hitoshi Mitake wrote:
> At Tue, 17 Mar 2015 13:40:51 +0800,
> Liu Yuan wrote:
> > 
> > On Tue, Mar 17, 2015 at 01:32:44PM +0800, Liu Yuan wrote:
> > > On Tue, Mar 17, 2015 at 01:16:00PM +0800, Liu Yuan wrote:
> > > > On Tue, Mar 17, 2015 at 02:03:03PM +0900, Hitoshi Mitake wrote:
> > > > > At Tue, 17 Mar 2015 13:58:08 +0900,
> > > > > Hitoshi Mitake wrote:
> > > > > > 
> > > > > > At Tue, 17 Mar 2015 13:51:48 +0900,
> > > > > > Hitoshi Mitake wrote:
> > > > > > > 
> > > > > > > At Tue, 17 Mar 2015 12:42:35 +0800,
> > > > > > > Liu Yuan wrote:
> > > > > > > > 
> > > > > > > > On Mon, Mar 16, 2015 at 08:49:34PM +0800, Liu Yuan wrote:
> > > > > > > > > From: Liu Yuan <liuyuan at cmss.chinamobile.com>
> > > > > > > > > 
> > > > > > > > > This patch fixes following problem:
> > > > > > > > > 
> > > > > > > > > $ dog vdi create test 100M
> > > > > > > > > $ dog vdi delete test
> > > > > > > > > $ dog vdi create test 200M
> > > > > > > > > $ dog vdi list # expect show test, but nothing shows out.
> > > > > > > > > 
> > > > > > > > > Which was brought by
> > > > > > > > > 
> > > > > > > > > *commit f68feab7edc0ded86701a2e902d85616b24942ab
> > > > > > > > > *Author: Saeki Masaki <saeki.masaki at po.ntts.co.jp>
> > > > > > > > > *Date:   Wed Nov 26 10:50:46 2014 +0900
> > > > > > > > > 
> > > > > > > > >     sheep/dog: introduce new bitmap for delete vdi
> > > > > > > > > 
> > > > > > > > > Cc: Saeki Masaki <saeki.masaki at po.ntts.co.jp>
> > > > > > > > > Signed-off-by: Liu Yuan <liuyuan at cmss.chinamobile.com>
> > > > > > > > > ---
> > > > > > > > >  sheep/ops.c | 7 +++++--
> > > > > > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/sheep/ops.c b/sheep/ops.c
> > > > > > > > > index 8a0f77c..464ae0b 100644
> > > > > > > > > --- a/sheep/ops.c
> > > > > > > > > +++ b/sheep/ops.c
> > > > > > > > > @@ -130,11 +130,14 @@ static int post_cluster_new_vdi(const struct sd_req *req, struct sd_rsp *rsp,
> > > > > > > > >  	unsigned long nr = rsp->vdi.vdi_id;
> > > > > > > > >  	int ret = rsp->result;
> > > > > > > > >  
> > > > > > > > > -	sd_info("req->vdi.base_vdi_id: %x, rsp->vdi.vdi_id: %x", req->vdi.base_vdi_id, rsp->vdi.vdi_id);
> > > > > > > > > +	sd_info("req->vdi.base_vdi_id: %x, rsp->vdi.vdi_id: %x",
> > > > > > > > > +		req->vdi.base_vdi_id, rsp->vdi.vdi_id);
> > > > > > > > >  
> > > > > > > > >  	sd_debug("done %d %lx", ret, nr);
> > > > > > > > > -	if (ret == SD_RES_SUCCESS)
> > > > > > > > > +	if (ret == SD_RES_SUCCESS) {
> > > > > > > > >  		atomic_set_bit(nr, sys->vdi_inuse);
> > > > > > > > > +		atomic_clear_bit(nr, sys->vdi_deleted);
> > > > > > > > > +	}
> > > > > > > > >  
> > > > > > > > >  	return ret;
> > > > > > > > >  }
> > > > > > > > > -- 
> > > > > > > > > 1.9.1
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > At least this patch fixes a real problem and should be applied, Hitoshi?
> > > > > > > > 
> > > > > > > > Yuan
> > > > > > > 
> > > > > > > Applied this one, thanks.
> > > > > > 
> > > > > > On the second thought, this problem cannot happen on the current
> > > > > > master. It is caused by the old invalid recycling. So I'll revert
> > > > > > it, OK?
> > > > > > # seems that there's no patches after this one, I'll simply rebase and
> > > > > > # update forcibly
> > > > > 
> > > > > Removed this one, sorry for confusion.
> > > > 
> > > > Current master actually have this problem because bit in vdi_deleted never get
> > > > cleared.
> > > 
> > > If you disable vid_gc(), this problem will show up. This patch actually fix a
> > > broader bug. For every newly created vid, either fresh, snapshot, clone, we
> > > should clear it in vdi_deleted.
> > 
> > The master dosen't have this problem because we don't reuse vid. If gc is
> > disabled, we should logically first clear the bit in the deleted because we
> > actually create a new one. This will prevent any future bugs and will allow our
> > in-house patch, which reuse vid as old algorithm work correctly.
> 
> If you really want to use the old recycling algorithm, a good place
> for clearing the vdi_deleted bitmap must be here:

I don't think so. current master has this consistency problem if gc is not
enabled:

after 3 dog command:

 dog vdi create test 100M
 dog vdi delete test
 dog vdi create test 100M

 suppose bit A represent test,

 bit A is set in vdi_inuse, which means it is occupied.
 bit A is also set in vdi_deleted, which means it is deleted already

is this logically correct?

> 
> +		if (vdi_is_deleted(inode)) {
> +			/* Recycle the deleted inode for fresh vdi create */
> +			if (!iocb->create_snapshot)
> +				info->free_bit = i;
> 				/* clear vdi_deleted bit */
> +			continue;
> +		}
> 
> To be honest, using the old algorithm is a bad idea. The data
> corruption might be caused by operation miss and bugs in control plane
> (e.g. OpenStack Cinder). Because the corruption is invoked by wrong
> sequence of VDI creation, which cannot be prevented by sheep.

It is not a problem as big as all node die, at least for us. And old algorithm
is good enough for us, especially its stability and simplicity.

These bad operation can be prevented in upper layer. There are tons of ways to
kill sheep cluster by false operation, e,g, a normal user can issue dog command.
which can easily kill a sheepdog cluster.

Thanks,
Yuan