[sheepdog] [PATCH] add new function to fulfill parallelly dicarding

Liu Yuan namei.unix at gmail.com
Tue Dec 17 15:08:03 CET 2013


On Tue, Dec 17, 2013 at 08:02:02PM +0800, Robin Dong wrote:
> The new function sd_inode_write_vid_only will only write 'vid' in 'idx' for
> non-hyper-volume vid. For hyper-volume vdi, we write the 'vid' in function
> sd_inode_set_vid() only when this 'vid' is exists.
> 
> Signed-off-by: Robin Dong <sanbai at taobao.com>
> ---
>  include/sheepdog_proto.h |    3 +++
>  lib/sd_inode.c           |   28 ++++++++++++++++++++++++++++
>  sheep/ops.c              |    4 ++--
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
> index 6a7ada7..22d6f73 100644
> --- a/include/sheepdog_proto.h
> +++ b/include/sheepdog_proto.h
> @@ -282,6 +282,9 @@ extern void sd_inode_set_vid(write_node_fn writer, read_node_fn reader,
>  			     uint32_t vdi_id);
>  extern int sd_inode_write(write_node_fn writer, struct sd_inode *inode,
>  			  int flags, bool create, bool direct);
> +extern int sd_inode_write_vid_only(write_node_fn writer, struct sd_inode *inode,
> +				   uint32_t idx, uint32_t vid, int flags,
> +				   bool create, bool direct);
>  extern int sd_inode_write_vid(write_node_fn writer, struct sd_inode *inode,
>  			      uint32_t idx, uint32_t vid, uint32_t value,
>  			      int flags, bool create, bool direct);
> diff --git a/lib/sd_inode.c b/lib/sd_inode.c
> index 0043981..fb689d5 100644
> --- a/lib/sd_inode.c
> +++ b/lib/sd_inode.c
> @@ -585,6 +585,15 @@ void sd_inode_set_vid(write_node_fn writer, read_node_fn reader,
>  			ret = search_whole_btree(reader, inode, idx, &path);
>  			if (ret == SD_RES_BTREE_FOUND) {
>  				path.p_ext->vdi_id = vdi_id;
> +				/* only write the vdi_id in sd_extent */
> +				if (path.p_ext_header)
> +					header = path.p_ext_header;
> +				writer(vid_to_vdi_oid(inode->vdi_id), &vdi_id,
> +				       sizeof(vdi_id), (char *)(path.p_ext) -
> +				       (char *)header +
> +				       offsetof(struct sd_extent, vdi_id), 0,
> +				       inode->nr_copies, inode->copy_policy,
> +				       false, false);
>  				goto out;
>  			} else {
>  				ret = insert_new_node(writer, reader, inode,
> @@ -661,6 +670,25 @@ out:
>  	return ret;
>  }
>  
> +/*
> + * Only write the vid for non-hyper-volume vdi.
> + * For hyper-volume, sd_inode_set_vid() have already wrote the vid when
> + * search_whole_btree() return SD_RES_BTREE_FOUND. Therefore, this function
> + * could only be called when vid could be found.
> + */
> +int sd_inode_write_vid_only(write_node_fn writer, struct sd_inode *inode,
> +			    uint32_t idx, uint32_t vid, int flags, bool create,
> +			    bool direct)
> +{
> +	int ret = SD_RES_SUCCESS;
> +	if (inode->store_policy == 0)
> +		ret = writer(vid_to_vdi_oid(vid), &vid, sizeof(vid),
> +			     SD_INODE_HEADER_SIZE + sizeof(vid) * idx,
> +			     flags, inode->nr_copies, inode->copy_policy,
> +			     create, direct);
> +	return ret;
> +}
> +
>  /* Write the meta-data of inode out */
>  int sd_inode_write_vid(write_node_fn writer, struct sd_inode *inode,
>  		       uint32_t idx, uint32_t vid, uint32_t value,
> diff --git a/sheep/ops.c b/sheep/ops.c
> index 1e9bc1e..403dfa2 100644
> --- a/sheep/ops.c
> +++ b/sheep/ops.c
> @@ -842,8 +842,8 @@ static int local_discard_obj(struct request *req)
>  	/* if vid in idx is not exist, we don't need to remove it */
>  	if (tmp_vid) {
>  		INODE_SET_VID(inode, idx, 0);
> -		ret = sd_inode_write_vid(sheep_bnode_writer, inode, idx, vid,
> -					 0, 0, false, false);
> +		ret = sd_inode_write_vid_only(sheep_bnode_writer, inode, idx,
> +					      vid, 0, false, false);
>  		if (ret != SD_RES_SUCCESS)
>  			goto out;
>  		if (sd_remove_object(oid) != SD_RES_SUCCESS)

With this patch, I'm really more confused by how to properly call INODE_SET_VID
and sd_inode_write{,_vid} to do what I want to do.

I think *usage* of these API to sd_inode is badly needed, cause no one execept
you can really make use of them. (how to call them in combination to achieve something)

For e.g, both old array and btree assume that using one index for vdi to indicate
if the corresponding object is allocated or not.

 data[idx] = vid # indicate this object is allocated,
 data[idx] = 0 # indicate this object is not allocated

So when we create a object, we need to
  1 firstly create the data object
  2 update the inode
    - set data[idx] = vid;
    - update the offset in inode that corresponds to data[idx].

When we discard a object, we need to
  1 firstly remove the data object
  2 update the inode
    - set data[idx] = 0;
    - update the offset in inode that corresponds to data[idx].

Sometimes we want to update the inode in a batch mode, e.g we want to create 10
objects in one go, we need to
  1 firstly create 10 data objects
  2 update the inode
    - set data[idx...idx+10] = vid;
    - update the offset in inode that corresponds to data[idx...idx+10]

"set data" just update the local copy of inode,
"update the offset" really upate inode in the backend by a single write.

Aboves can be easily achieved by old array approach but I am not sure how
diffcult it is for btree.

Thanks
Yuan



More information about the sheepdog mailing list