[sheepdog-users] Users inputs for new reclaim algorithm, please

Mon Mar 17 14:19:15 CET 2014

At Mon, 17 Mar 2014 16:43:16 +0800,
Liu Yuan wrote:
> 
> On Mon, Mar 17, 2014 at 04:35:50PM +0800, Liu Yuan wrote:
> > On Mon, Mar 17, 2014 at 04:12:03PM +0800, Liu Yuan wrote:
> > > Hi all,
> > > 
> > >    I think this would be a big topic regarding new deletion algirthm, which is
> > > currently bening undertaken by Hithsh.
> > > 
> > >  The motivation is very well explained as follows:
> > > 
> > >  $ dog vdi create image
> > >  $ dog vdi write image < some_data
> > >  $ dog vdi snapshot image -s snap1
> > >  $ dog vdi write image < some_data
> > >  $ dog vdi delete image            <- this doesn't reclaim the objects
> > >                                          of the image
> > >  $ dog vdi delete image -s snap1   <- this reclaims all the data objects
> > >                                          of both image and image:snap1
> > > 
> > > Simply put, we use a simple and stupid algirthm that when all the vdis on the
> > > snapshot chain are deleted, the space will then be released.
> > > 
> > > The new algorithm add more complexity to handle this problem, but also introduce
> > > a new big problem.
> > > 
> > > With new algorithm,
> > > 
> > >  $ dog vdi create image
> > >  $ dog vdi snapshot image -s snap1
> > >  $ dog vdi clone -s snap1 image clone
> > >  $ dog vdi delete clone  <-- this operation will surprise you that it won't
> > >                              release space but instead increase the space.
> > > 
> > > Following is the real case, we can see that deletion of a clone, which uses 316MB
> > > space, will actaully cause 5.2GB more space to be used.
> > > 
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > >   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> > > c clone        0   40 GB  316 MB  1.5 GB 2014-03-17 14:35   72a1e2    2:2              
> > > s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
> > >   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > > Id	Size	Used	Avail	Use%
> > >  0	39 GB	932 MB	38 GB	  2%
> > >  1	39 GB	878 MB	38 GB	  2%
> > >  2	39 GB	964 MB	38 GB	  2%
> > >  3	39 GB	932 MB	38 GB	  2%
> > >  4	39 GB	876 MB	38 GB	  2%
> > >  5	39 GB	978 MB	38 GB	  2%
> > > Total	234 GB	5.4 GB	229 GB	  2%
> > > 
> > > Total virtual image size	80 GB
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > > Id	Size	Used	Avail	Use%
> > >  0	34 GB	1.7 GB	33 GB	  4%
> > >  1	34 GB	1.7 GB	33 GB	  4%
> > >  2	35 GB	1.9 GB	33 GB	  5%
> > >  3	35 GB	1.8 GB	33 GB	  5%
> > >  4	35 GB	1.8 GB	33 GB	  5%
> > >  5	35 GB	1.9 GB	33 GB	  5%
> > > Total	208 GB	11 GB	197 GB	  5%
> > > 
> > > Total virtual image size	40 GB
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > >   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> > > s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
> > >   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> > 
> > For a comparison, there is the same real case with current (old) algorithm
> > 
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> >   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> > c clone        0   40 GB  320 MB  1.5 GB 2014-03-17 16:27   72a1e2    2:2              
> > s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 16:22   7c2b25    2:2          base
> >   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 16:26   7c2b26    2:2              
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id	Size	Used	Avail	Use%
> >  0	40 GB	732 MB	39 GB	  1%
> >  1	40 GB	706 MB	39 GB	  1%
> >  2	40 GB	724 MB	39 GB	  1%
> >  3	40 GB	740 MB	39 GB	  1%
> >  4	40 GB	708 MB	39 GB	  1%
> >  5	40 GB	782 MB	39 GB	  1%
> > Total	240 GB	4.3 GB	236 GB	  1%
> > 
> > Total virtual image size	80 GB
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id	Size	Used	Avail	Use%
> >  0	41 GB	638 MB	40 GB	  1%
> >  1	40 GB	608 MB	40 GB	  1%
> >  2	40 GB	614 MB	40 GB	  1%
> >  3	40 GB	624 MB	40 GB	  1%
> >  4	40 GB	606 MB	40 GB	  1%
> >  5	41 GB	662 MB	40 GB	  1%
> > Total	243 GB	3.7 GB	239 GB	  1%
> > 
> > Total virtual image size	40 GB
> > 
> > We can see that space is much more effecient than new algorith in two ways:
> >  - there is no extra space for interntal data for bookkeeping
> >    old: 4.3GB is used (1.8G + 320MB) x 2 = 4.3G
> >    new: 5.4GB used, 4.3G data + 1.1 GB internal data for gc
> > 
> >  - deletion of clone is quite faster because we really delete the objects of
> >    clone.
> >    old: 320MB*2 = 0.6GB data are removed
> >    new: 320MB*2 = 0.6GB data are removed + 5.8GB more data created for gc
> > 
> > I'm wondering if we should have two algorithm co-exist and have users choose
> > one over another like
> > 
> >  $ dog cluster format --gc xxx
> > or
> >  $ dog vdi create new --gc xxx
> > 
> 
> Besides, for clones, I notice that IO performance for clone VM drop from 57MB/s
> to 37MB/s on my box for dd write. I think it is because the overhead of creating
> gc objects for write.

Only CoW operation incurs performance overhead related to the GC. It
will not be seen from second time.

Thanks,
Hitoshi