[sheepdog-users] Users inputs for new reclaim algorithm, please

Mon Mar 17 14:16:09 CET 2014

At Mon, 17 Mar 2014 16:12:03 +0800,
Liu Yuan wrote:
> 
> Hi all,
> 
>    I think this would be a big topic regarding new deletion algirthm, which is
> currently bening undertaken by Hithsh.
> 
>  The motivation is very well explained as follows:
> 
>  $ dog vdi create image
>  $ dog vdi write image < some_data
>  $ dog vdi snapshot image -s snap1
>  $ dog vdi write image < some_data
>  $ dog vdi delete image            <- this doesn't reclaim the objects
>                                          of the image
>  $ dog vdi delete image -s snap1   <- this reclaims all the data objects
>                                          of both image and image:snap1
> 
> Simply put, we use a simple and stupid algirthm that when all the vdis on the
> snapshot chain are deleted, the space will then be released.
> 
> The new algorithm add more complexity to handle this problem, but also introduce
> a new big problem.
> 
> With new algorithm,
> 
>  $ dog vdi create image
>  $ dog vdi snapshot image -s snap1
>  $ dog vdi clone -s snap1 image clone
>  $ dog vdi delete clone  <-- this operation will surprise you that it won't
>                              release space but instead increase the space.
> 
> Following is the real case, we can see that deletion of a clone, which uses 316MB
> space, will actaully cause 5.2GB more space to be used.

The patchset v7 consumes amount of disk space by ledger objects
because of incorrect implementation of sparse object. It was just a
bug. The latest snapshot-object-reclaim branch has its fix.

> 
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> c clone        0   40 GB  316 MB  1.5 GB 2014-03-17 14:35   72a1e2    2:2              
> s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
>   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id	Size	Used	Avail	Use%
>  0	39 GB	932 MB	38 GB	  2%
>  1	39 GB	878 MB	38 GB	  2%
>  2	39 GB	964 MB	38 GB	  2%
>  3	39 GB	932 MB	38 GB	  2%
>  4	39 GB	876 MB	38 GB	  2%
>  5	39 GB	978 MB	38 GB	  2%
> Total	234 GB	5.4 GB	229 GB	  2%
> 
> Total virtual image size	80 GB
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
> Id	Size	Used	Avail	Use%
>  0	34 GB	1.7 GB	33 GB	  4%
>  1	34 GB	1.7 GB	33 GB	  4%
>  2	35 GB	1.9 GB	33 GB	  5%
>  3	35 GB	1.8 GB	33 GB	  5%
>  4	35 GB	1.8 GB	33 GB	  5%
>  5	35 GB	1.9 GB	33 GB	  5%
> Total	208 GB	11 GB	197 GB	  5%
> 
> Total virtual image size	40 GB
> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
> s test         1   40 GB  1.8 GB  0.0 MB 2014-03-17 14:16   7c2b25    2:2              
>   test         0   40 GB  0.0 MB  1.8 GB 2014-03-17 14:34   7c2b26    2:2              
> 
> For old algorithm, the clones 316MB will be released without posing any problem.
> 
> I think this is a very important issue for following use case:
> 
>  - suppose you are providing VM services with pre-defined iamges as bases
>  - these pre-defined images are actually snapshots in the sheepdog and you
>    you seldom delete them
>  - VM instance are provided by clone operation
>  - since VM instance are all created on demand, they are likely to be released
>    or recreated very often.
> 
> So if you have this usage in mind, you'll expect a catastrophic prolem:
>  - frequent cloned instance release and creation will pose much more space
>    pressure on you.
>  - when space is near low watermark, you are not allowed to delete clones because
>    deletion will actually increase the space and end up destroying your cluster.
>    You have no choise, either add more nodes nor deny create of new clones and
>    never try to delete clones later.
> 
> Any ideas?

Let sheepdog cluster run with small left disk space is really
dangerous. Because death of few nodes can exhaust left space and kill
the entire cluster. At least, our team has a guideline that sheepdog
cluster should run with enough left disk space (ideally, 50%).

If admins should delete VDIs for allocating disk space, it means that
they already commited a serious fault. Disk space shortage should be
resolved by adding more disks/nodes or avoided by controling  a number
of VDIs and their size (including snapshots, clones).

As an emergency solution, I can implement VDI family deletion which
requires no additional disk space (almost every process of the
deletion would be done in dog). With this solution and qemu-img
convert, users can free disk space in a safe manner.

Thanks,
Hitoshi