[Sheepdog] [PATCH 2/2] always delete data objects when deleting an cloned vdi

levin li levin108 at gmail.com
Tue May 8 03:58:55 CEST 2012


On 05/07/2012 11:59 PM, MORITA Kazutaka wrote:
> At Mon, 07 May 2012 09:51:00 +0800,
> levin li wrote:
>> On 05/07/2012 02:58 AM, MORITA Kazutaka wrote:
>>> At Fri, 04 May 2012 09:49:08 +0800,
>>> levin li wrote:
>>>> On 05/04/2012 03:46 AM, MORITA Kazutaka wrote:
>>>>> At Mon, 23 Apr 2012 14:18:06 +0800,
>>>>> Li Wenpeng wrote:
>>>>>> From: levin li<xingke.lwp at taobao.com>
>>>>>>
>>>>>> When deleting a cloned vdi, sheep find the root vdi and then
>>>>>> traverse the vdi chain(such as base -->    snapshot -->    clone) to
>>>>>> check wheter there's an undeleted vdi in the chain, if some vdi
>>>>>> in the chain isn't deleted, sheep just mark the cloned vdi as
>>>>>> deleted by clear its vdi name.
>>>>>>
>>>>>> But in fact a cloned vdi may created its own objects by copy-on-write,
>>>>>> these objects can be deleted when deleting the vdi, so we make
>>>>>> the cloned vdi to be deleted as the root vdi, then we can deleting
>>>>>> its data objects, in delete_one() we check whether the object belongs
>>>>>> to itself to determine whether to delete the object.
>>>>>>
>>>>>> Signed-off-by: levin li<xingke.lwp at taobao.com>
>>>>>> ---
>>>>>>     sheep/vdi.c |   14 ++++++++++++++
>>>>>>     1 files changed, 14 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/sheep/vdi.c b/sheep/vdi.c
>>>>>> index d2a522d..c8085c8 100644
>>>>>> --- a/sheep/vdi.c
>>>>>> +++ b/sheep/vdi.c
>>>>>> @@ -478,6 +478,12 @@ static void delete_one(struct work *work)
>>>>>>     		if (!inode->data_vdi_id[i])
>>>>>>     			continue;
>>>>>>
>>>>>> +		if (inode->data_vdi_id[i] != inode->vdi_id) {
>>>>>> +			dprintf("object %" PRIx64 " is base's data, would not be deleted.\n",
>>>>>> +					vid_to_data_oid(inode->data_vdi_id[i], i));
>>>>>> +			continue;
>>>>>> +		}
>>>>>> +
>>>>>>     		ret = remove_object(dw->entries, dw->nr_vnodes, dw->nr_zones, dw->epoch,
>>>>>>     			      vid_to_data_oid(inode->data_vdi_id[i], i),
>>>>>>     			      inode->nr_copies);
>>>>>> @@ -587,6 +593,14 @@ next:
>>>>>>     			  vid_to_vdi_oid(vid), (char *)inode,
>>>>>>     			  SD_INODE_HEADER_SIZE, 0, sys->nr_sobjs);
>>>>>>
>>>>>> +	if (vid == inode->vdi_id&&    inode->snap_id == 1
>>>>> What does 'inode->snap_id == 1' mean here?  I think this patch is not
>>>>> correct at all.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Kazutaka
>>>>>
>>>> When inode->snap_ctime is zero, it means this can not be a snapshot.
>>>>
>>>> Further more, in the current vdi chain, the base vdi that we see by
>>>> 'collie vdi list' has parent id which may point to the snapshot vdi
>>>> which is the previous base vdi, in the case that an snapshot exist, the
>>>> snap_id of the current base vdi must be greater than 1.
>>>>
>>>> But the cloned vdi always has a snap_id to be 1, so we can determine that
>>>> if inode->snap_id == 1&&   inode->parent_vdi_id != 0&&
>>>> !inode->snap_ctime, then
>>>> the vdi must be a cloned vdi.
>>> For example:
>>>
>>>    $ collie cluster format
>>>    $ collie vdi create base 1G
>>>    $ collie vdi snapshot base
>>>    $ collie vdi clone base cloned
>>>    $ collie vdi clone -s 1 base cloned
>>>    $ collie vdi snapshot cloned
>>>    $ collie vdi list
>>>       Name        Id    Size    Used  Shared    Creation time   VDI id  Tag
>>>     s base         1  1.0 GB  0.0 MB  0.0 MB 2012-05-07 03:50   54c278
>>>       base         2  1.0 GB  0.0 MB  0.0 MB 2012-05-07 03:50   54c279
>>>     s cloned       1  1.0 GB  0.0 MB  0.0 MB 2012-05-07 03:50   c876b2
>>>       cloned       2  1.0 GB  0.0 MB  0.0 MB 2012-05-07 03:51   c876b3
>>>
>>> The snap_id of the vdi c876b2 is 1, but its snap_ctime is not zero
>>> because the vdi is a snapshot.  The snap_ctime of the vdi c876b3 is
>>> zero, but its snap_is 2.  So, with your code, no vdi is detected as a
>>> clonend vdi against the above example.
>>>
>>> Kazutaka
>> Yes, I already considered this case, in your example,c876b2
>> should be the cloned VDI, but now it becomes a snapshot, and c876b3
>> becomes the base VDI of c876b2, it's OK.
>>
>> The only reason we need to mark a VDI as cloned it for deletion work,
>> let me explain why.
>>
>> A cloned VDI I mean isn't always the VDI created by the clone operation,
>> but a cloned VDI on the leaves of the VDI tree, if it has some objects
>> which created by copy-on-write that not shared with any other VDI, then
>> we can delete the objects when we deleting the VDI to avoid a leak so
>> as to save disk space.
>>
>> In the example, c876b3 becomes a base VDI, we can not delete it's data
>> objects directly when deleting it, even if some of its objects are cloned
>> by copy-on-write, since c876b2 may shares objects with it.
>>
>> The same reason for c876b2, it becomes a snapshot VDI, no longer a simple
>> cloned VDI any more.
> So this patch deletes the vdi when it is a cloned one but doesn't have
> any snapshots?  If yes, you should write so.
>
Yes, sorry not write so clearly.
> I wonder why you tried to handle the specific situation.  I think we
> can generalize your code; Sheepdog can delete the vdi safely if its
> all descendant vdis are already deleted.
>
> Kazutaka
Certainly current code can perform well, but there's a problem,
I described in the commit log of this patch:
http://lists.wpkg.org/pipermail/sheepdog/2012-May/003407.html

When we try to delete a cloned VDI,in the old logic,
objects of cloned VDI can only be deleted after all the VDI in
the tree path has been deleted, but there's a problem.

We may clone many VDIs from one snapshot, and these VDIs can be
deleted frequently, but we may*never*  delete the snapshot VDI,
so in this case, objects of cloned VDI would always stay in the disk,
as we know, they're already useless and should be deleted,
and waste too much disk space, but we can only make it deleted
after all the VDI in the tree path,*including the snapshot VDI
have beed deleted*, so we'd to make it better.

thanks,

levin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120508/80108228/attachment-0003.html>


More information about the sheepdog mailing list