[sheepdog] Users inputs for new reclaim algorithm, please

Liu Yuan namei.unix at gmail.com
Mon Mar 17 09:43:16 CET 2014


On Mon, Mar 17, 2014 at 04:35:50PM +0800, Liu Yuan wrote:
> On Mon, Mar 17, 2014 at 04:12:03PM +0800, Liu Yuan wrote:
> > Hi all,
> > 
> >    I think this would be a big topic regarding new deletion algirthm, which is
> > currently bening undertaken by Hithsh.
> > 
> >  The motivation is very well explained as follows:
> > 
> >  $ dog vdi create image
> >  $ dog vdi write image < some_data
> >  $ dog vdi snapshot image -s snap1
> >  $ dog vdi write image < some_data
> >  $ dog vdi delete image            <- this doesn't reclaim the objects
> >                                          of the image
> >  $ dog vdi delete image -s snap1   <- this reclaims all the data objects
> >                                          of both image and image:snap1
> > 
> > Simply put, we use a simple and stupid algirthm that when all the vdis on the
> > snapshot chain are deleted, the space will then be released.
> > 
> > The new algorithm add more complexity to handle this problem, but also introduce
> > a new big problem.
> > 
> > With new algorithm,
> > 
> >  $ dog vdi create image
> >  $ dog vdi snapshot image -s snap1
> >  $ dog vdi clone -s snap1 image clone
> >  $ dog vdi delete clone  <-- this operation will surprise you that it won't
> >                              release space but instead increase the space.
> > 
> > Following is the real case, we can see that deletion of a clone, which uses 316MB
> > space, will actaully cause 5.2GB more space to be used.
> > 
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> >   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> > c clone        0   40 GB  316 MB  1.5 GB 2014-03-17 14:35   72a1e2    2:2              
> > s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
> >   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id	Size	Used	Avail	Use%
> >  0	39 GB	932 MB	38 GB	  2%
> >  1	39 GB	878 MB	38 GB	  2%
> >  2	39 GB	964 MB	38 GB	  2%
> >  3	39 GB	932 MB	38 GB	  2%
> >  4	39 GB	876 MB	38 GB	  2%
> >  5	39 GB	978 MB	38 GB	  2%
> > Total	234 GB	5.4 GB	229 GB	  2%
> > 
> > Total virtual image size	80 GB
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id	Size	Used	Avail	Use%
> >  0	34 GB	1.7 GB	33 GB	  4%
> >  1	34 GB	1.7 GB	33 GB	  4%
> >  2	35 GB	1.9 GB	33 GB	  5%
> >  3	35 GB	1.8 GB	33 GB	  5%
> >  4	35 GB	1.8 GB	33 GB	  5%
> >  5	35 GB	1.9 GB	33 GB	  5%
> > Total	208 GB	11 GB	197 GB	  5%
> > 
> > Total virtual image size	40 GB
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> >   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> > s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
> >   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> 
> For a comparison, there is the same real case with current (old) algorithm
> 
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> c clone        0   40 GB  320 MB  1.5 GB 2014-03-17 16:27   72a1e2    2:2              
> s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 16:22   7c2b25    2:2          base
>   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 16:26   7c2b26    2:2              
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id	Size	Used	Avail	Use%
>  0	40 GB	732 MB	39 GB	  1%
>  1	40 GB	706 MB	39 GB	  1%
>  2	40 GB	724 MB	39 GB	  1%
>  3	40 GB	740 MB	39 GB	  1%
>  4	40 GB	708 MB	39 GB	  1%
>  5	40 GB	782 MB	39 GB	  1%
> Total	240 GB	4.3 GB	236 GB	  1%
> 
> Total virtual image size	80 GB
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id	Size	Used	Avail	Use%
>  0	41 GB	638 MB	40 GB	  1%
>  1	40 GB	608 MB	40 GB	  1%
>  2	40 GB	614 MB	40 GB	  1%
>  3	40 GB	624 MB	40 GB	  1%
>  4	40 GB	606 MB	40 GB	  1%
>  5	41 GB	662 MB	40 GB	  1%
> Total	243 GB	3.7 GB	239 GB	  1%
> 
> Total virtual image size	40 GB
> 
> We can see that space is much more effecient than new algorith in two ways:
>  - there is no extra space for interntal data for bookkeeping
>    old: 4.3GB is used (1.8G + 320MB) x 2 = 4.3G
>    new: 5.4GB used, 4.3G data + 1.1 GB internal data for gc
> 
>  - deletion of clone is quite faster because we really delete the objects of
>    clone.
>    old: 320MB*2 = 0.6GB data are removed
>    new: 320MB*2 = 0.6GB data are removed + 5.8GB more data created for gc
> 
> I'm wondering if we should have two algorithm co-exist and have users choose
> one over another like
> 
>  $ dog cluster format --gc xxx
> or
>  $ dog vdi create new --gc xxx
> 

Besides, for clones, I notice that IO performance for clone VM drop from 57MB/s
to 37MB/s on my box for dd write. I think it is because the overhead of creating
gc objects for write.

Thanks
Yuan



More information about the sheepdog mailing list