[sheepdog] Users inputs for new reclaim algorithm, please

Liu Yuan namei.unix at gmail.com
Mon Mar 17 09:35:50 CET 2014


On Mon, Mar 17, 2014 at 04:12:03PM +0800, Liu Yuan wrote:
> Hi all,
> 
>    I think this would be a big topic regarding new deletion algirthm, which is
> currently bening undertaken by Hithsh.
> 
>  The motivation is very well explained as follows:
> 
>  $ dog vdi create image
>  $ dog vdi write image < some_data
>  $ dog vdi snapshot image -s snap1
>  $ dog vdi write image < some_data
>  $ dog vdi delete image            <- this doesn't reclaim the objects
>                                          of the image
>  $ dog vdi delete image -s snap1   <- this reclaims all the data objects
>                                          of both image and image:snap1
> 
> Simply put, we use a simple and stupid algirthm that when all the vdis on the
> snapshot chain are deleted, the space will then be released.
> 
> The new algorithm add more complexity to handle this problem, but also introduce
> a new big problem.
> 
> With new algorithm,
> 
>  $ dog vdi create image
>  $ dog vdi snapshot image -s snap1
>  $ dog vdi clone -s snap1 image clone
>  $ dog vdi delete clone  <-- this operation will surprise you that it won't
>                              release space but instead increase the space.
> 
> Following is the real case, we can see that deletion of a clone, which uses 316MB
> space, will actaully cause 5.2GB more space to be used.
> 
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> c clone        0   40 GB  316 MB  1.5 GB 2014-03-17 14:35   72a1e2    2:2              
> s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
>   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id	Size	Used	Avail	Use%
>  0	39 GB	932 MB	38 GB	  2%
>  1	39 GB	878 MB	38 GB	  2%
>  2	39 GB	964 MB	38 GB	  2%
>  3	39 GB	932 MB	38 GB	  2%
>  4	39 GB	876 MB	38 GB	  2%
>  5	39 GB	978 MB	38 GB	  2%
> Total	234 GB	5.4 GB	229 GB	  2%
> 
> Total virtual image size	80 GB
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id	Size	Used	Avail	Use%
>  0	34 GB	1.7 GB	33 GB	  4%
>  1	34 GB	1.7 GB	33 GB	  4%
>  2	35 GB	1.9 GB	33 GB	  5%
>  3	35 GB	1.8 GB	33 GB	  5%
>  4	35 GB	1.8 GB	33 GB	  5%
>  5	35 GB	1.9 GB	33 GB	  5%
> Total	208 GB	11 GB	197 GB	  5%
> 
> Total virtual image size	40 GB
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
>   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              

For a comparison, there is the same real case with current (old) algorithm

yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
c clone        0   40 GB  320 MB  1.5 GB 2014-03-17 16:27   72a1e2    2:2              
s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 16:22   7c2b25    2:2          base
  test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 16:26   7c2b26    2:2              
yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
Id	Size	Used	Avail	Use%
 0	40 GB	732 MB	39 GB	  1%
 1	40 GB	706 MB	39 GB	  1%
 2	40 GB	724 MB	39 GB	  1%
 3	40 GB	740 MB	39 GB	  1%
 4	40 GB	708 MB	39 GB	  1%
 5	40 GB	782 MB	39 GB	  1%
Total	240 GB	4.3 GB	236 GB	  1%

Total virtual image size	80 GB
yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
Id	Size	Used	Avail	Use%
 0	41 GB	638 MB	40 GB	  1%
 1	40 GB	608 MB	40 GB	  1%
 2	40 GB	614 MB	40 GB	  1%
 3	40 GB	624 MB	40 GB	  1%
 4	40 GB	606 MB	40 GB	  1%
 5	41 GB	662 MB	40 GB	  1%
Total	243 GB	3.7 GB	239 GB	  1%

Total virtual image size	40 GB

We can see that space is much more effecient than new algorith in two ways:
 - there is no extra space for interntal data for bookkeeping
   old: 4.3GB is used (1.8G + 320MB) x 2 = 4.3G
   new: 5.4GB used, 4.3G data + 1.1 GB internal data for gc

 - deletion of clone is quite faster because we really delete the objects of
   clone.
   old: 320MB*2 = 0.6GB data are removed
   new: 320MB*2 = 0.6GB data are removed + 5.8GB more data created for gc

I'm wondering if we should have two algorithm co-exist and have users choose
one over another like

 $ dog cluster format --gc xxx
or
 $ dog vdi create new --gc xxx

Thanks
Yuan
   



More information about the sheepdog mailing list