[sheepdog-users] Users inputs for new reclaim algorithm, please
Hitoshi Mitake
mitake.hitoshi at gmail.com
Mon Mar 17 14:19:15 CET 2014
At Mon, 17 Mar 2014 16:43:16 +0800,
Liu Yuan wrote:
>
> On Mon, Mar 17, 2014 at 04:35:50PM +0800, Liu Yuan wrote:
> > On Mon, Mar 17, 2014 at 04:12:03PM +0800, Liu Yuan wrote:
> > > Hi all,
> > >
> > > I think this would be a big topic regarding new deletion algirthm, which is
> > > currently bening undertaken by Hithsh.
> > >
> > > The motivation is very well explained as follows:
> > >
> > > $ dog vdi create image
> > > $ dog vdi write image < some_data
> > > $ dog vdi snapshot image -s snap1
> > > $ dog vdi write image < some_data
> > > $ dog vdi delete image <- this doesn't reclaim the objects
> > > of the image
> > > $ dog vdi delete image -s snap1 <- this reclaims all the data objects
> > > of both image and image:snap1
> > >
> > > Simply put, we use a simple and stupid algirthm that when all the vdis on the
> > > snapshot chain are deleted, the space will then be released.
> > >
> > > The new algorithm add more complexity to handle this problem, but also introduce
> > > a new big problem.
> > >
> > > With new algorithm,
> > >
> > > $ dog vdi create image
> > > $ dog vdi snapshot image -s snap1
> > > $ dog vdi clone -s snap1 image clone
> > > $ dog vdi delete clone <-- this operation will surprise you that it won't
> > > release space but instead increase the space.
> > >
> > > Following is the real case, we can see that deletion of a clone, which uses 316MB
> > > space, will actaully cause 5.2GB more space to be used.
> > >
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > > Name Id Size Used Shared Creation time VDI id Copies Tag
> > > c clone 0 40 GB 316 MB 1.5 GB 2014-03-17 14:35 72a1e2 2:2
> > > s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 14:16 7c2b25 2:2
> > > test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 14:34 7c2b26 2:2
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > > Id Size Used Avail Use%
> > > 0 39 GB 932 MB 38 GB 2%
> > > 1 39 GB 878 MB 38 GB 2%
> > > 2 39 GB 964 MB 38 GB 2%
> > > 3 39 GB 932 MB 38 GB 2%
> > > 4 39 GB 876 MB 38 GB 2%
> > > 5 39 GB 978 MB 38 GB 2%
> > > Total 234 GB 5.4 GB 229 GB 2%
> > >
> > > Total virtual image size 80 GB
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > > Id Size Used Avail Use%
> > > 0 34 GB 1.7 GB 33 GB 4%
> > > 1 34 GB 1.7 GB 33 GB 4%
> > > 2 35 GB 1.9 GB 33 GB 5%
> > > 3 35 GB 1.8 GB 33 GB 5%
> > > 4 35 GB 1.8 GB 33 GB 5%
> > > 5 35 GB 1.9 GB 33 GB 5%
> > > Total 208 GB 11 GB 197 GB 5%
> > >
> > > Total virtual image size 40 GB
> > > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > > Name Id Size Used Shared Creation time VDI id Copies Tag
> > > s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 14:16 7c2b25 2:2
> > > test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 14:34 7c2b26 2:2
> >
> > For a comparison, there is the same real case with current (old) algorithm
> >
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > Name Id Size Used Shared Creation time VDI id Copies Tag
> > c clone 0 40 GB 320 MB 1.5 GB 2014-03-17 16:27 72a1e2 2:2
> > s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 16:22 7c2b25 2:2 base
> > test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 16:26 7c2b26 2:2
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id Size Used Avail Use%
> > 0 40 GB 732 MB 39 GB 1%
> > 1 40 GB 706 MB 39 GB 1%
> > 2 40 GB 724 MB 39 GB 1%
> > 3 40 GB 740 MB 39 GB 1%
> > 4 40 GB 708 MB 39 GB 1%
> > 5 40 GB 782 MB 39 GB 1%
> > Total 240 GB 4.3 GB 236 GB 1%
> >
> > Total virtual image size 80 GB
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id Size Used Avail Use%
> > 0 41 GB 638 MB 40 GB 1%
> > 1 40 GB 608 MB 40 GB 1%
> > 2 40 GB 614 MB 40 GB 1%
> > 3 40 GB 624 MB 40 GB 1%
> > 4 40 GB 606 MB 40 GB 1%
> > 5 41 GB 662 MB 40 GB 1%
> > Total 243 GB 3.7 GB 239 GB 1%
> >
> > Total virtual image size 40 GB
> >
> > We can see that space is much more effecient than new algorith in two ways:
> > - there is no extra space for interntal data for bookkeeping
> > old: 4.3GB is used (1.8G + 320MB) x 2 = 4.3G
> > new: 5.4GB used, 4.3G data + 1.1 GB internal data for gc
> >
> > - deletion of clone is quite faster because we really delete the objects of
> > clone.
> > old: 320MB*2 = 0.6GB data are removed
> > new: 320MB*2 = 0.6GB data are removed + 5.8GB more data created for gc
> >
> > I'm wondering if we should have two algorithm co-exist and have users choose
> > one over another like
> >
> > $ dog cluster format --gc xxx
> > or
> > $ dog vdi create new --gc xxx
> >
>
> Besides, for clones, I notice that IO performance for clone VM drop from 57MB/s
> to 37MB/s on my box for dd write. I think it is because the overhead of creating
> gc objects for write.
Only CoW operation incurs performance overhead related to the GC. It
will not be seen from second time.
Thanks,
Hitoshi
More information about the sheepdog-users
mailing list