[sheepdog-users] Users inputs for new reclaim algorithm, please
Andrew J. Hobbs
ajhobbs at desu.edu
Tue Mar 18 14:08:48 CET 2014
I have to agree with Yuan here. If I'm deleting something, I expect
that space to be available for reuse elsewhere, not end up taking
several times the deleted amount.
On 03/17/2014 09:43 AM, Liu Yuan wrote:
> On Mon, Mar 17, 2014 at 10:16:09PM +0900, Hitoshi Mitake wrote:
>> At Mon, 17 Mar 2014 16:12:03 +0800,
>> Liu Yuan wrote:
>>> Hi all,
>>>
>>> I think this would be a big topic regarding new deletion algirthm, which is
>>> currently bening undertaken by Hithsh.
>>>
>>> The motivation is very well explained as follows:
>>>
>>> $ dog vdi create image
>>> $ dog vdi write image < some_data
>>> $ dog vdi snapshot image -s snap1
>>> $ dog vdi write image < some_data
>>> $ dog vdi delete image <- this doesn't reclaim the objects
>>> of the image
>>> $ dog vdi delete image -s snap1 <- this reclaims all the data objects
>>> of both image and image:snap1
>>>
>>> Simply put, we use a simple and stupid algirthm that when all the vdis on the
>>> snapshot chain are deleted, the space will then be released.
>>>
>>> The new algorithm add more complexity to handle this problem, but also introduce
>>> a new big problem.
>>>
>>> With new algorithm,
>>>
>>> $ dog vdi create image
>>> $ dog vdi snapshot image -s snap1
>>> $ dog vdi clone -s snap1 image clone
>>> $ dog vdi delete clone <-- this operation will surprise you that it won't
>>> release space but instead increase the space.
>>>
>>> Following is the real case, we can see that deletion of a clone, which uses 316MB
>>> space, will actaully cause 5.2GB more space to be used.
>> The patchset v7 consumes amount of disk space by ledger objects
>> because of incorrect implementation of sparse object. It was just a
>> bug. The latest snapshot-object-reclaim branch has its fix.
>>
>>> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>>> Name Id Size Used Shared Creation time VDI id Copies Tag
>>> c clone 0 40 GB 316 MB 1.5 GB 2014-03-17 14:35 72a1e2 2:2
>>> s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 14:16 7c2b25 2:2
>>> test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 14:34 7c2b26 2:2
>>> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
>>> Id Size Used Avail Use%
>>> 0 39 GB 932 MB 38 GB 2%
>>> 1 39 GB 878 MB 38 GB 2%
>>> 2 39 GB 964 MB 38 GB 2%
>>> 3 39 GB 932 MB 38 GB 2%
>>> 4 39 GB 876 MB 38 GB 2%
>>> 5 39 GB 978 MB 38 GB 2%
>>> Total 234 GB 5.4 GB 229 GB 2%
>>>
>>> Total virtual image size 80 GB
>>> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi delete clone
>>> yliu at ubuntu-precise:~/sheepdog$ dog/dog node info
>>> Id Size Used Avail Use%
>>> 0 34 GB 1.7 GB 33 GB 4%
>>> 1 34 GB 1.7 GB 33 GB 4%
>>> 2 35 GB 1.9 GB 33 GB 5%
>>> 3 35 GB 1.8 GB 33 GB 5%
>>> 4 35 GB 1.8 GB 33 GB 5%
>>> 5 35 GB 1.9 GB 33 GB 5%
>>> Total 208 GB 11 GB 197 GB 5%
>>>
>>> Total virtual image size 40 GB
>>> yliu at ubuntu-precise:~/sheepdog$ dog/dog vdi list
>>> Name Id Size Used Shared Creation time VDI id Copies Tag
>>> s test 1 40 GB 1.8 GB 0.0 MB 2014-03-17 14:16 7c2b25 2:2
>>> test 0 40 GB 0.0 MB 1.8 GB 2014-03-17 14:34 7c2b26 2:2
>>>
>>> For old algorithm, the clones 316MB will be released without posing any problem.
>>>
>>> I think this is a very important issue for following use case:
>>>
>>> - suppose you are providing VM services with pre-defined iamges as bases
>>> - these pre-defined images are actually snapshots in the sheepdog and you
>>> you seldom delete them
>>> - VM instance are provided by clone operation
>>> - since VM instance are all created on demand, they are likely to be released
>>> or recreated very often.
>>>
>>> So if you have this usage in mind, you'll expect a catastrophic prolem:
>>> - frequent cloned instance release and creation will pose much more space
>>> pressure on you.
>>> - when space is near low watermark, you are not allowed to delete clones because
>>> deletion will actually increase the space and end up destroying your cluster.
>>> You have no choise, either add more nodes nor deny create of new clones and
>>> never try to delete clones later.
>>>
>>> Any ideas?
>> Let sheepdog cluster run with small left disk space is really
>> dangerous. Because death of few nodes can exhaust left space and kill
>> the entire cluster. At least, our team has a guideline that sheepdog
>> cluster should run with enough left disk space (ideally, 50%).
>>
> I don't buy this idea. Sheepdog is said to be a cheap storage solution and if
> we can only run safely with 50% capacity, it means that we double the storage
> cost. More importantly, any storage who can't support deletion to reclaim space
> after data are near full, should be considered unacceptable.
>
> Suppose you are writing a file system and telling people should not try to fill
> it more than 50% and if you do, assume we reach 90%, your file system is dead.
> you can't reclaim any space at all by deletion it and worsely, any deletion
> will destroy the file system.
>
> For the bottom line, we deletion should reduce space consumption instead of
> posing more space consumption because for any users and any system, deletion
> *means* means reclaim space. We should never try to break this intuition.
>
>> If admins should delete VDIs for allocating disk space, it means that
>> they already commited a serious fault.
> As above commented, it is a serious fault to me that I can't reclaim space
> by simply deleting of existing vdis.
>
>> Disk space shortage should be
>> resolved by adding more disks/nodes or avoided by controling a number
>> of VDIs and their size (including snapshots, clones).
>>
>> As an emergency solution, I can implement VDI family deletion which
>> requires no additional disk space (almost every process of the
>> deletion would be done in dog). With this solution and qemu-img
>> convert, users can free disk space in a safe manner.
>>
> We need this dirty workaround because we need new algorithm. So what we gain from
> it by introducing more and more problems (performance degration, deletion problem
> and some other unseen problems)? We just solve a prolem that people want to free
> space after snapshots are deleted!
>
> Note, ironically, the initial motivation is to reclaim space for snapshot deletion.
> but with this new algorithm, we actually can't relcaim space for a more broader
> use case, deletion of clones are forbidden if you find sotrage is out of space!
>
> Please think twice about new algorithm. If we need it just because we write it
> more than the real needs....
>
> There might be some users need this new algorithm for their specific usage, but
> I'd suggest that:
>
> 1 make old algorithm as default reclaim one
> 2 modularize the reclaim algorithm and add new algorithm as an option for users
> in this way, we can improve the new algorithm steps by steps and possibly
> we can introduce more algorithms to meet varoius needs.
>
> Thanks
> Yuan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajhobbs.vcf
Type: text/x-vcard
Size: 353 bytes
Desc: ajhobbs.vcf
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140318/a9f2b8f8/attachment-0005.vcf>
More information about the sheepdog-users
mailing list