[Sheepdog] [PATCH v4 2/6] fix a bug of deleting base vdi fail

Fri May 4 04:02:41 CEST 2012

On 05/04/2012 03:37 AM, MORITA Kazutaka wrote:
> At Thu,  3 May 2012 18:25:45 +0800,
> levin li wrote:
>> Take a view of the following snapshot chain:
>>
>> base vdi -->  snapshot vdi -->  cloned vdi
>>
>> when cloned vdi has its own data objects created by copy-on-write,
>> we firstly delete the cloned vdi, the delete the base vdi, at last
>> we delete snapshot vdi.
>>
>> when deleting the snapshot vdi, it try to traverse the snapshot chain to
>> cleanup all the data objects, but the copy-on-write objects has been
> What does 'copy-on-write objects' mean here?  Data objects of the
> cloned VDI?
>
Yes, I mean the data objects of the cloned VDI that not
shared with its base VDI.
>> deleted by the first deletion work, so it may get failed and set the
>> dw->delete_error to be true, it doesn't matter for the future deletion,
> I'm not sure why dw->delete_error could be true even if there is no
> problem.  The current delete process also traverse the already deleted
> VDIs?  If so, it should be modified, I think.
>
> BTW, It seems that the snapshot/clone code of the latest master is
> broken, so I couldn't test your example on my environemnt at all.
> Does the latest sheepdog code works on your environment correctly?
>
> Thanks,
>
> Kazutaka

In such a case that when the deletion work is in process,
the nodes in the cluster changes (join or leave), the recovery
work may migrate some data objects from one node to another,
remove_object() may not find the objects in the current node copies,
so it fails to delete the objects, but indeed the objects are not
deleted, so in the old code, we would have no chance the delete them
any more, it makes something a leak.

The snapshot/clone clone of the master branch runs well in my environment,
since nobody changes the code recently, I only modified the logic
about VDI deletion.

thanks,

levin