[sheepdog] [PATCH v5 04/14] sheep: introduce generational reference counting for object reclaim

Liu Yuan namei.unix at gmail.com
Wed Mar 5 06:34:59 CET 2014


On Wed, Mar 05, 2014 at 02:20:00PM +0900, Hitoshi Mitake wrote:
> At Tue, 4 Mar 2014 21:28:07 +0800,
> Liu Yuan wrote:
> > 
> > On Tue, Mar 04, 2014 at 02:42:48PM +0900, Hitoshi Mitake wrote:
> > > From: Hitoshi Mitake <mitake.hitoshi at gmail.com>
> > > 
> > > Generational reference counting is an algorithm to reclaim data
> > > efficiently without race conditions on distributed system.  This
> > > extends vdi objects structure to store generational reference counts,
> > > and increments the counts when creating snapshots.
> > > 
> > > Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> > > Cc: Valerio Pachera <sirio81 at gmail.com>
> > > Cc: Alessandro Bolgia <alessandro at extensys.it>
> > > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > ---
> > > 
> > > v5:
> > >  - update store version and break compatibility explicitly
> > >  - rename data_ref -> gref
> > > 
> > > v4:
> > >  - remove a bug in snapshot_vdi(), storing an invalid number of references
> > > 
> > >  include/sheepdog_proto.h |    6 +++++
> > >  sheep/config.c           |    2 +-
> > >  sheep/migrate.c          |    8 +++++++
> > >  sheep/vdi.c              |   58 +++++++++++++++++++++++++++++++++++-----------
> > >  4 files changed, 59 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/include/sheepdog_proto.h b/include/sheepdog_proto.h
> > > index 9361bad..9937497 100644
> > > --- a/include/sheepdog_proto.h
> > > +++ b/include/sheepdog_proto.h
> > > @@ -212,6 +212,11 @@ struct sd_rsp {
> > >  	};
> > >  };
> > >  
> > > +struct generation_reference {
> > > +	int32_t generation;
> > > +	int32_t count;
> > > +};
> > > +
> > >  struct sd_inode {
> > >  	char name[SD_MAX_VDI_LEN];
> > >  	char tag[SD_MAX_VDI_TAG_LEN];
> > > @@ -230,6 +235,7 @@ struct sd_inode {
> > >  	uint32_t child_vdi_id[MAX_CHILDREN];
> > >  	uint32_t data_vdi_id[SD_INODE_DATA_INDEX];
> > >  	uint32_t btree_counter;
> > > +	struct generation_reference gref[SD_INODE_DATA_INDEX];
> > >  };
> > 
> > This patch set passes tests on my box, great!
> > 
> > For better compatibility, I'd suggest
> > 
> > make gref array in a spectial object like btree intermedia node object, instead
> > of embedding into inode and put 'btree_counter' in the unused field (child_vdi_id)
> > 
> > Then we can keep the current inode layout without modification of QEMU and TGT 
> > backend code to support hyper volume later.
> > 
> > This way 
> > 
> > - inode won't become cumbersome and too big as more and more field adds in. 
> > - upper layer won't be aware of inode layout change and consistent with sd_inode
> 
> Sorry, I forgot to mention about this point. Current gref scheme
> doesn't break protocol between qemu, tgt and sheep. Because gref is
> only appended so qemu, tgt don't have to care about it.

Yes, it doesn't break the compatibility, but it indeed make people frown at it
when they find inode objects are much bigger than the one that QEMU or TGT read
in.

Appending more fields in the inode isn't a scalable approach that make inode size
inconsistent between client and server. I think we need a flag to add xattr-like
objects to inode, so it works as

 [ sd_inode ] [ iobj1 ] [ iobj2 ] ... [ iobjN]

which will make inode more extensible.

how about storing gref arrays in our xattr objects? Then we can easily make use
of current code to store it and we can even easily dump this information by
'vdi attr get' for debug purpse.

Thanks
Yuan



More information about the sheepdog mailing list