[sheepdog] [PATCH v5 04/14] sheep: introduce generational reference counting for object reclaim

Liu Yuan namei.unix at gmail.com
Tue Mar 4 14:28:07 CET 2014


On Tue, Mar 04, 2014 at 02:42:48PM +0900, Hitoshi Mitake wrote:
> From: Hitoshi Mitake <mitake.hitoshi at gmail.com>
> 
> Generational reference counting is an algorithm to reclaim data
> efficiently without race conditions on distributed system.  This
> extends vdi objects structure to store generational reference counts,
> and increments the counts when creating snapshots.
> 
> Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> Cc: Valerio Pachera <sirio81 at gmail.com>
> Cc: Alessandro Bolgia <alessandro at extensys.it>
> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> ---
> 
> v5:
>  - update store version and break compatibility explicitly
>  - rename data_ref -> gref
> 
> v4:
>  - remove a bug in snapshot_vdi(), storing an invalid number of references
> 
>  include/sheepdog_proto.h |    6 +++++
>  sheep/config.c           |    2 +-
>  sheep/migrate.c          |    8 +++++++
>  sheep/vdi.c              |   58 +++++++++++++++++++++++++++++++++++-----------
>  4 files changed, 59 insertions(+), 15 deletions(-)
> 
> diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
> index 9361bad..9937497 100644
> --- a/include/sheepdog_proto.h
> +++ b/include/sheepdog_proto.h
> @@ -212,6 +212,11 @@ struct sd_rsp {
>  	};
>  };
>  
> +struct generation_reference {
> +	int32_t generation;
> +	int32_t count;
> +};
> +
>  struct sd_inode {
>  	char name[SD_MAX_VDI_LEN];
>  	char tag[SD_MAX_VDI_TAG_LEN];
> @@ -230,6 +235,7 @@ struct sd_inode {
>  	uint32_t child_vdi_id[MAX_CHILDREN];
>  	uint32_t data_vdi_id[SD_INODE_DATA_INDEX];
>  	uint32_t btree_counter;
> +	struct generation_reference gref[SD_INODE_DATA_INDEX];
>  };

This patch set passes tests on my box, great!

For better compatibility, I'd suggest

make gref array in a spectial object like btree intermedia node object, instead
of embedding into inode and put 'btree_counter' in the unused field (child_vdi_id)

Then we can keep the current inode layout without modification of QEMU and TGT 
backend code to support hyper volume later.

This way 

- inode won't become cumbersome and too big as more and more field adds in. 
- upper layer won't be aware of inode layout change and consistent with sd_inode

For easier restructuring, I think you can add two patches just on top of current
patch set,

- one for moving btree_counter since it is not currently used by client code but
  will be in the future when we add hyper volume support.

- one for adding a special object to hold the arrays of generation_reference.

Thanks
Yuan



More information about the sheepdog mailing list