[sheepdog] [PATCH v5 04/14] sheep: introduce generational reference counting for object reclaim
Liu Yuan
namei.unix at gmail.com
Tue Mar 4 14:28:07 CET 2014
On Tue, Mar 04, 2014 at 02:42:48PM +0900, Hitoshi Mitake wrote:
> From: Hitoshi Mitake <mitake.hitoshi at gmail.com>
>
> Generational reference counting is an algorithm to reclaim data
> efficiently without race conditions on distributed system. This
> extends vdi objects structure to store generational reference counts,
> and increments the counts when creating snapshots.
>
> Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> Cc: Valerio Pachera <sirio81 at gmail.com>
> Cc: Alessandro Bolgia <alessandro at extensys.it>
> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> ---
>
> v5:
> - update store version and break compatibility explicitly
> - rename data_ref -> gref
>
> v4:
> - remove a bug in snapshot_vdi(), storing an invalid number of references
>
> include/sheepdog_proto.h | 6 +++++
> sheep/config.c | 2 +-
> sheep/migrate.c | 8 +++++++
> sheep/vdi.c | 58 +++++++++++++++++++++++++++++++++++-----------
> 4 files changed, 59 insertions(+), 15 deletions(-)
>
> diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
> index 9361bad..9937497 100644
> --- a/include/sheepdog_proto.h
> +++ b/include/sheepdog_proto.h
> @@ -212,6 +212,11 @@ struct sd_rsp {
> };
> };
>
> +struct generation_reference {
> + int32_t generation;
> + int32_t count;
> +};
> +
> struct sd_inode {
> char name[SD_MAX_VDI_LEN];
> char tag[SD_MAX_VDI_TAG_LEN];
> @@ -230,6 +235,7 @@ struct sd_inode {
> uint32_t child_vdi_id[MAX_CHILDREN];
> uint32_t data_vdi_id[SD_INODE_DATA_INDEX];
> uint32_t btree_counter;
> + struct generation_reference gref[SD_INODE_DATA_INDEX];
> };
This patch set passes tests on my box, great!
For better compatibility, I'd suggest
make gref array in a spectial object like btree intermedia node object, instead
of embedding into inode and put 'btree_counter' in the unused field (child_vdi_id)
Then we can keep the current inode layout without modification of QEMU and TGT
backend code to support hyper volume later.
This way
- inode won't become cumbersome and too big as more and more field adds in.
- upper layer won't be aware of inode layout change and consistent with sd_inode
For easier restructuring, I think you can add two patches just on top of current
patch set,
- one for moving btree_counter since it is not currently used by client code but
will be in the future when we add hyper volume support.
- one for adding a special object to hold the arrays of generation_reference.
Thanks
Yuan
More information about the sheepdog
mailing list