[sheepdog] [PATCH v5 04/14] sheep: introduce generational reference counting for object reclaim

Hitoshi Mitake mitake.hitoshi at gmail.com
Wed Mar 5 06:13:57 CET 2014


At Tue, 4 Mar 2014 21:28:07 +0800,
Liu Yuan wrote:
> 
> On Tue, Mar 04, 2014 at 02:42:48PM +0900, Hitoshi Mitake wrote:
> > From: Hitoshi Mitake <mitake.hitoshi at gmail.com>
> > 
> > Generational reference counting is an algorithm to reclaim data
> > efficiently without race conditions on distributed system.  This
> > extends vdi objects structure to store generational reference counts,
> > and increments the counts when creating snapshots.
> > 
> > Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> > Cc: Valerio Pachera <sirio81 at gmail.com>
> > Cc: Alessandro Bolgia <alessandro at extensys.it>
> > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > ---
> > 
> > v5:
> >  - update store version and break compatibility explicitly
> >  - rename data_ref -> gref
> > 
> > v4:
> >  - remove a bug in snapshot_vdi(), storing an invalid number of references
> > 
> >  include/sheepdog_proto.h |    6 +++++
> >  sheep/config.c           |    2 +-
> >  sheep/migrate.c          |    8 +++++++
> >  sheep/vdi.c              |   58 +++++++++++++++++++++++++++++++++++-----------
> >  4 files changed, 59 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
> > index 9361bad..9937497 100644
> > --- a/include/sheepdog_proto.h
> > +++ b/include/sheepdog_proto.h
> > @@ -212,6 +212,11 @@ struct sd_rsp {
> >  	};
> >  };
> >  
> > +struct generation_reference {
> > +	int32_t generation;
> > +	int32_t count;
> > +};
> > +
> >  struct sd_inode {
> >  	char name[SD_MAX_VDI_LEN];
> >  	char tag[SD_MAX_VDI_TAG_LEN];
> > @@ -230,6 +235,7 @@ struct sd_inode {
> >  	uint32_t child_vdi_id[MAX_CHILDREN];
> >  	uint32_t data_vdi_id[SD_INODE_DATA_INDEX];
> >  	uint32_t btree_counter;
> > +	struct generation_reference gref[SD_INODE_DATA_INDEX];
> >  };
> 
> This patch set passes tests on my box, great!
> 
> For better compatibility, I'd suggest
> 
> make gref array in a spectial object like btree intermedia node object, instead
> of embedding into inode and put 'btree_counter' in the unused field (child_vdi_id)
> 
> Then we can keep the current inode layout without modification of QEMU and TGT 
> backend code to support hyper volume later.
> 
> This way 
> 
> - inode won't become cumbersome and too big as more and more field
> - adds in. 

I think adding a new type of objects for generation reference is not
needed. It increases complexity of the code and consumes the bit for
indicating object types. There are only 3 bits for this purpose. 

In addition, gref array doesn't consume amount of disk space because
of the sparse object scheme. And we can also reduce network trafic for
transmitting inode object by sending/recving only offsetof(struct
sd_inode, gref) instead of sizeof(struct sd_inode). It can be done
later easily.

> - one for moving btree_counter since it is not currently used by client code but
>   will be in the future when we add hyper volume support.

I agree with this proposal. But I think this change should be appended
at the tail of the patchset for making natural change (child_vdi_id is
removed in 8th patch and moving btree_counter should be moved after).

If you agree with it, I'll send v6 later.

Thanks,
Hitoshi



More information about the sheepdog mailing list