[sheepdog] Users inputs for new reclaim algorithm, please
Liu Yuan
namei.unix at gmail.com
Mon Mar 17 09:43:16 CET 2014
On Mon, Mar 17, 2014 at 04:35:50PM +0800, Liu Yuan wrote:
> On Mon, Mar 17, 2014 at 04:12:03PM +0800, Liu Yuan wrote:
> > Hi all,
> >
> > I think this would be a big topic regarding new deletion algirthm, which is
> > currently bening undertaken by Hithsh.
> >
> > The motivation is very well explained as follows:
> >
> > $ dog vdi create image
> > $ dog vdi write image < some_data
> > $ dog vdi snapshot image -s snap1
> > $ dog vdi write image < some_data
> > $ dog vdi delete image <- this doesn't reclaim the objects
> > of the image
> > $ dog vdi delete image -s snap1 <- this reclaims all the data objects
> > of both image and image:snap1
> >
> > Simply put, we use a simple and stupid algirthm that when all the vdis on the
> > snapshot chain are deleted, the space will then be released.
> >
> > The new algorithm add more complexity to handle this problem, but also introduce
> > a new big problem.
> >
> > With new algorithm,
> >
> > $ dog vdi create image
> > $ dog vdi snapshot image -s snap1
> > $ dog vdi clone -s snap1 image clone
> > $ dog vdi delete clone <-- this operation will surprise you that it won't
> > release space but instead increase the space.
> >
> > Following is the real case, we can see that deletion of a clone, which uses 316MB
> > space, will actaully cause 5.2GB more space to be used.
> >
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > Name Id Size Used Shared Creation time VDI id Copies Tag
> > c clone 0 40 GB 316 MB 1.5 GB 2014-03-17 14:35 72a1e2 2:2
> > s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 14:16 7c2b25 2:2
> > test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 14:34 7c2b26 2:2
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id Size Used Avail Use%
> > 0 39 GB 932 MB 38 GB 2%
> > 1 39 GB 878 MB 38 GB 2%
> > 2 39 GB 964 MB 38 GB 2%
> > 3 39 GB 932 MB 38 GB 2%
> > 4 39 GB 876 MB 38 GB 2%
> > 5 39 GB 978 MB 38 GB 2%
> > Total 234 GB 5.4 GB 229 GB 2%
> >
> > Total virtual image size 80 GB
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> > Id Size Used Avail Use%
> > 0 34 GB 1.7 GB 33 GB 4%
> > 1 34 GB 1.7 GB 33 GB 4%
> > 2 35 GB 1.9 GB 33 GB 5%
> > 3 35 GB 1.8 GB 33 GB 5%
> > 4 35 GB 1.8 GB 33 GB 5%
> > 5 35 GB 1.9 GB 33 GB 5%
> > Total 208 GB 11 GB 197 GB 5%
> >
> > Total virtual image size 40 GB
> > yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> > Name Id Size Used Shared Creation time VDI id Copies Tag
> > s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 14:16 7c2b25 2:2
> > test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 14:34 7c2b26 2:2
>
> For a comparison, there is the same real case with current (old) algorithm
>
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
> Name Id Size Used Shared Creation time VDI id Copies Tag
> c clone 0 40 GB 320 MB 1.5 GB 2014-03-17 16:27 72a1e2 2:2
> s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 16:22 7c2b25 2:2 base
> test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 16:26 7c2b26 2:2
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id Size Used Avail Use%
> 0 40 GB 732 MB 39 GB 1%
> 1 40 GB 706 MB 39 GB 1%
> 2 40 GB 724 MB 39 GB 1%
> 3 40 GB 740 MB 39 GB 1%
> 4 40 GB 708 MB 39 GB 1%
> 5 40 GB 782 MB 39 GB 1%
> Total 240 GB 4.3 GB 236 GB 1%
>
> Total virtual image size 80 GB
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id Size Used Avail Use%
> 0 41 GB 638 MB 40 GB 1%
> 1 40 GB 608 MB 40 GB 1%
> 2 40 GB 614 MB 40 GB 1%
> 3 40 GB 624 MB 40 GB 1%
> 4 40 GB 606 MB 40 GB 1%
> 5 41 GB 662 MB 40 GB 1%
> Total 243 GB 3.7 GB 239 GB 1%
>
> Total virtual image size 40 GB
>
> We can see that space is much more effecient than new algorith in two ways:
> - there is no extra space for interntal data for bookkeeping
> old: 4.3GB is used (1.8G + 320MB) x 2 = 4.3G
> new: 5.4GB used, 4.3G data + 1.1 GB internal data for gc
>
> - deletion of clone is quite faster because we really delete the objects of
> clone.
> old: 320MB*2 = 0.6GB data are removed
> new: 320MB*2 = 0.6GB data are removed + 5.8GB more data created for gc
>
> I'm wondering if we should have two algorithm co-exist and have users choose
> one over another like
>
> $ dog cluster format --gc xxx
> or
> $ dog vdi create new --gc xxx
>
Besides, for clones, I notice that IO performance for clone VM drop from 57MB/s
to 37MB/s on my box for dd write. I think it is because the overhead of creating
gc objects for write.
Thanks
Yuan
More information about the sheepdog
mailing list