[sheepdog] [PATCH v2 3/3] object list cache: reclaim object list cache when receiving a deletion event.
levin li
levin108 at gmail.com
Thu Jul 19 03:13:51 CEST 2012
On 07/18/2012 09:23 PM, MORITA Kazutaka wrote:
> At Wed, 18 Jul 2012 16:05:46 +0800,
> levin li wrote:
>>
>> On 07/18/2012 03:49 PM, MORITA Kazutaka wrote:
>>> At Wed, 18 Jul 2012 15:25:16 +0800,
>>> levin li wrote:
>>>>>>> I'm still a bit confused why we need both objlist_cache_delete and
>>>>>>> objlist_cache_remove. If no recovery happens during object deletion,
>>>>>>> we don't need objlist_cache_delete because objlist_cache_remove
>>>>>>> removes the entry. objlist_cache_delete is necessary for the case
>>>>>>> object deletion and recovery happen at the same time to clean up
>>>>>>> unhandled entries by objlist_cache_remove. Is it correct?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, if no recovery happens before, objlist_cache_remove() can always
>>>>>> delete the entry in object list cache while VDI deletion, but once recovery
>>>>>> happens, it's something different.
>>>>>>
>>>>>> For example, obj A and B stay in node n1 at first, but after a recovery,
>>>>>> A may be migrated from n1 to n2, and B stays in n1, but the objlist entry
>>>>>
>>>>> Is there any reason we don't remove the objlist entry when we unlink
>>>>> the object from the farm working directory?
>>>>>
>>>>> diff --git a/sheep/farm/trunk.c b/sheep/farm/trunk.c
>>>>> index b45427d..e390a63 100644
>>>>> --- a/sheep/farm/trunk.c
>>>>> +++ b/sheep/farm/trunk.c
>>>>> @@ -297,6 +297,7 @@ int trunk_file_write_recovery(unsigned char *outsha1)
>>>>> goto out;
>>>>> }
>>>>> dprintf("remove file %"PRIx64"\n", entry->raw.oid);
>>>>> + objlist_cache_remove(oid);
>>>>> put_entry(entry);
>>>>> }
>>>>> }
>>>>>
>>>>> Recovery algorithm has been changed from before, so probably I'm
>>>>> missing something. Correct me if I'm wrong.
>>>>>
>>>>
>>>> We does call objlist_cache_remove() when deleting a specified object, but
>>>> objlist_cache_remove() can only delete the specified objlist entry on local
>>>> node, but as I explained before the objlist may not stay on the local node
>>>> because of recovery migrating the object without migrating the objlist entry.
>>>
>>> After migrating the objects to other nodes, they will be removed from
>>> the farm working directory. If we remove the objlist entry when
>>> removing the objects from the working directory, there is no objlist
>>> entry whose oid doesn't exists on local node, isn't it.
>>
>> That is the problem, we can't remove the objlist entry as soon as we remove the
>> object from farm working directory, if node A end recovery before some other nodes,
>> and then it delete the stale object from the farm working directory, but if it
>> also delete the objlist entry, it may cause problem, try thinking of another node B
>> which issues a get_obj_list() request after the objlist entry is deleted on the
>> original node A, but still not added to the target node C, then node B would not find
>> the objlist entry, then for node B, this object is ignored to recovery, so it's lost.
>
> That brings me another question. The farm driver may remove objects
> from the working directory before they are migrated to other nodes.
> Why is it not a problem.
>
Before farm driver remove objects from working directory, it backup the object in trunk
files, if recovery process can not find the object in working directory, it would go to
trunk to find the object to recover.
>>
>>
>>>
>>> Objlist cache should be a cache of oid list on the local farm working
>>> directory, no? I'm confused with what the objlist cache is caching.
>>>
>>
>> It does be the oid list, thinking of the time without object list cache, no matter simple
>> store or farm never delete the stale object, so no such problems before, but as we cleanup
>> the working directory, the problem comes, that's another reason a objlist cache is needed.
>
> Thank you, I understand why this patch is necessary. Shouldn't we
> document this on the wiki or in the source code? I think the reason
> why objlist cache must contain stale objects is not obvious without
> explanation.
It's indeed not obvious, I'd like to comment in source code.
thanks,
levin
>
> Thanks,
>
> Kazutaka
>
More information about the sheepdog
mailing list