[sheepdog-users] Users inputs for new reclaim algorithm, please

Mon Mar 17 17:02:18 CET 2014

On Tue, Mar 18, 2014 at 12:49:54AM +0900, MORITA Kazutaka wrote:
> At Mon, 17 Mar 2014 21:43:25 +0800,
> Liu Yuan wrote:
> > > > 
> > > > With new algorithm,
> > > > 
> > > >  $ dog vdi create image
> > > >  $ dog vdi snapshot image -s snap1
> > > >  $ dog vdi clone -s snap1 image clone
> > > >  $ dog vdi delete clone  <-- this operation will surprise you that it won't
> > > >                              release space but instead increase the space.
> > > > 
> > > > Following is the real case, we can see that deletion of a clone, which uses 316MB
> > > > space, will actaully cause 5.2GB more space to be used.
> 
> This is obviously strange.  Although the algorithm creates additional
> objects to count reference counts, the objects are sparse and should
> not waste many spaces at all even in the worst case.
> 
> > > > So if you have this usage in mind, you'll expect a catastrophic prolem:
> > > >  - frequent cloned instance release and creation will pose much more space
> > > >    pressure on you.
> > > >  - when space is near low watermark, you are not allowed to delete clones because
> > > >    deletion will actually increase the space and end up destroying your cluster.
> > > >    You have no choise, either add more nodes nor deny create of new clones and
> > > >    never try to delete clones later.
> 
> After the algorithm is implemented correctly, this looks like a corner
> case since the additional space for the new reclaim algorithm is very
> small - it should be only 8 bytes for each object IIUC.  However, if
> you still concern about the case, we can preallocate some spaces for
> that beforehand to be used for ledger objects.  For example,
> 
>  1. Preallocate some small files for each device when sheep starts up.
> 
>  2. When the sheepdog cluster becomes disk-full and the user requests
>     object deletion, we can rename the preallocated file to a ledger
>     object and continue object reclaiming.
> 
> In either way, I think this should be a future work.  Sheepdog still
> have some bugs in handling a disk-full problem even without object
> reclaiming.
> 
> > There might be some users need this new algorithm for their specific usage, but
> > I'd suggest that:
> > 
> >  1 make old algorithm as default reclaim one
> >  2 modularize the reclaim algorithm and add new algorithm as an option for users
> >    in this way, we can improve the new algorithm steps by steps and possibly
> >    we can introduce more algorithms to meet varoius needs.
> 
> IMHO, modularizing object reclaiming is overkill.  I cannot imagine so
> many algorithms for that.  Even if we keep the old algorithm, adding a
> sheep command line option to enable this experimental object
> reclaiming looks enough.  If we come up with another one, then let's
> discuss this topic again.

Keep old algorithm is a bottom line to me. Either sheep option or static #ifdef
looks fine to me. We can later refine it to be dynamically pluggable.

New algorithm looks to me more a partial solution than a generic one, compared
with old algorithm. So keep old algorithm really make sense especially to those
who seldom delete snapshots.

Thanks
Yuan