[sheepdog] [PATCH v3 3/3] dog: use a minimum number of copies and zones for vdi checking

Liu Yuan namei.unix at gmail.com
Thu Jan 16 07:53:23 CET 2014


On Wed, Jan 15, 2014 at 06:12:18PM +0900, Hitoshi Mitake wrote:
> At Mon, 13 Jan 2014 19:03:37 +0900,
> Hitoshi Mitake wrote:
> > 
> > At Mon, 13 Jan 2014 17:09:45 +0800,
> > Liu Yuan wrote:
> > > 
> > > On Sun, Jan 12, 2014 at 08:12:02PM +0900, Hitoshi Mitake wrote:
> > > > At Sun, 12 Jan 2014 18:39:29 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > On Sun, Jan 12, 2014 at 12:47:14AM +0900, Hitoshi Mitake wrote:
> > > > > > At Sat, 11 Jan 2014 23:06:22 +0800,
> > > > > > Liu Yuan wrote:
> > > > > > > 
> > > > > > > On Sun, Jan 12, 2014 at 12:02:39AM +0900, Hitoshi Mitake wrote:
> > > > > > > > At Sat, 11 Jan 2014 22:58:24 +0800,
> > > > > > > > Liu Yuan wrote:
> > > > > > > > > 
> > > > > > > > > On Sat, Jan 11, 2014 at 11:44:29PM +0900, Hitoshi Mitake wrote:
> > > > > > > > > > From: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > > > > > > > 
> > > > > > > > > > Current sheepdog allows the condition of a number of zones < a number
> > > > > > > > > > of copies. But "dog vdi check" cannot handle the case well because it
> > > > > > > > > > tries to create copies of lost/corrupted objects based on
> > > > > > > > > > inode->nr_copies.
> > > > > > > > > > 
> > > > > > > > > > This patch lets "dog vdi check" create a minimum number of copies and
> > > > > > > > > > zones.
> > > > > > > > > > 
> > > > > > > > > > Reported-by: Marcin Mirosław <marcin at mejor.pl>
> > > > > > > > > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > > > > > > > ---
> > > > > > > > > > 
> > > > > > > > > > v2: renameing, zones_nr -> sd_zones_nr
> > > > > > > > > > 
> > > > > > > > > >  dog/vdi.c | 19 ++++++++++++-------
> > > > > > > > > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/dog/vdi.c b/dog/vdi.c
> > > > > > > > > > index 8fd4664..4847350 100644
> > > > > > > > > > --- a/dog/vdi.c
> > > > > > > > > > +++ b/dog/vdi.c
> > > > > > > > > > @@ -1800,11 +1800,11 @@ static void vdi_check_object_main(struct work *work)
> > > > > > > > > >  }
> > > > > > > > > >  
> > > > > > > > > >  static void queue_vdi_check_work(const struct sd_inode *inode, uint64_t oid,
> > > > > > > > > > -				 uint64_t *done, struct work_queue *wq)
> > > > > > > > > > +				 uint64_t *done, struct work_queue *wq,
> > > > > > > > > > +				 int nr_copies)
> > > > > > > > > >  {
> > > > > > > > > >  	struct vdi_check_info *info;
> > > > > > > > > >  	const struct sd_vnode *tgt_vnodes[SD_MAX_COPIES];
> > > > > > > > > > -	int nr_copies = inode->nr_copies;
> > > > > > > > > >  
> > > > > > > > > >  	info = xzalloc(sizeof(*info) + sizeof(info->vcw[0]) * nr_copies);
> > > > > > > > > >  	info->oid = oid;
> > > > > > > > > > @@ -1830,6 +1830,7 @@ struct check_arg {
> > > > > > > > > >  	const struct sd_inode *inode;
> > > > > > > > > >  	uint64_t *done;
> > > > > > > > > >  	struct work_queue *wq;
> > > > > > > > > > +	int nr_copies;
> > > > > > > > > >  };
> > > > > > > > > >  
> > > > > > > > > >  static void check_cb(void *data, enum btree_node_type type, void *arg)
> > > > > > > > > > @@ -1844,7 +1845,8 @@ static void check_cb(void *data, enum btree_node_type type, void *arg)
> > > > > > > > > >  			oid = vid_to_data_oid(ext->vdi_id, ext->idx);
> > > > > > > > > >  			*(carg->done) = (uint64_t)ext->idx * SD_DATA_OBJ_SIZE;
> > > > > > > > > >  			vdi_show_progress(*(carg->done), carg->inode->vdi_size);
> > > > > > > > > > -			queue_vdi_check_work(carg->inode, oid, NULL, carg->wq);
> > > > > > > > > > +			queue_vdi_check_work(carg->inode, oid, NULL, carg->wq,
> > > > > > > > > > +					     carg->nr_copies);
> > > > > > > > > >  		}
> > > > > > > > > >  	}
> > > > > > > > > >  }
> > > > > > > > > > @@ -1855,8 +1857,9 @@ int do_vdi_check(const struct sd_inode *inode)
> > > > > > > > > >  	uint64_t done = 0, oid;
> > > > > > > > > >  	uint32_t vid;
> > > > > > > > > >  	struct work_queue *wq;
> > > > > > > > > > +	int nr_copies = min((int)inode->nr_copies, sd_zones_nr);
> > > > > > > > > >  
> > > > > > > > > > -	if (sd_nodes_nr < inode->nr_copies) {
> > > > > > > > > > +	if (sd_nodes_nr < nr_copies) {
> > > > > > > > > 
> > > > > > > > > I think we should check (sd_zones_nr < nr_copies) here
> > > > > > > > 
> > > > > > > > The comparison is already done in the above min(). Why should we check
> > > > > > > > it here?
> > > > > > > 
> > > > > > > then we should remove this check since it will never happen or we can leave it a
> > > > > > > safe guard here. Either way, since zone is the true entity, not 'node', so what
> > > > > > > we should really abort here is the condition that zones_nr < copies, no?
> > > > > > 
> > > > > > The condition sd_nodes_nr < nr_copies can be true when a number of
> > > > > > living nodes is less than zones or copies. In such a case, we cannot
> > > > > > recover objects and the panic problem rises.
> > > > > 
> > > > > sd_nodes_nr is always >= sd_zones_nr, no? We use zone to indicate a sheep entity
> > > > > for data distribution, not node(this is what your patch actually try to solve).
> > > > > So we should panic out only when sd_zones_nr < nr_copies (for now this should
> > > > > not happen, but we can have this for safe guard of code).
> > > > 
> > > > sd_zones_nr < nr_copies can happen because the past "unsafe" mode is
> > > > default now. And Marcin's situation was it.
> > > > 
> > > 
> > > After a second thought, I think we don't need this patch and instead, we should
> > > simply handle it as before:
> > > 
> > > - abort when sd_zones_nr < nr_copies as we did it before.
> > > - it is very difficult to handle 'vdi check' for erasure code when zones_nr < nr_copies
> > 
> > I can add a conditional branch based on copy_policy for erasure coded vdis, but
> >

At least you need to abort for erasure coded vdi if zones_nr < copy_nr if you
don't implement a full check for such case.

Thanks
Yuan




More information about the sheepdog mailing list