[sheepdog] question about replica recovery failure caused by oid.tmp file

Ruoyu liangry at ucweb.com
Fri Sep 19 05:12:42 CEST 2014


On 2014年09月19日 10:40, Hitoshi Mitake wrote:
> At Fri, 19 Sep 2014 10:28:19 +0800,
> Robin Dong wrote:
>> [1  <multipart/alternative (7bit)>]
>> [1.1  <text/plain; UTF-8 (quoted-printable)>]
>> default_init() --> for_each_object_in_wd() --> thread_process_path()
>> --> for_each_object_in_path() and the code for for_each_object_in_path():
>>
>> ......
>>                 if (is_tmp_dentry(d->d_name)) {
>>                         if (cleanup) {
>>                                 snprintf(file_name, sizeof(file_name),
>>                                                 "%s/%s", path, d->d_name);
>>                                 sd_debug("remove tmp object %s", file_name);
>>                                 if (unlink(file_name) < 0)
>>                                         sd_err("failed to unlink %s: %m",
>>                                                         file_name);
>>                         }
>>                         continue;
>>                 }
>> ......
> Oops! Thanks for your explanation, Robin. I misunderstood the code,
> sorry Alibaba guys!
>
>> sheep should remove .tmp file when it start, or it will not work correctly.
>> Ruoyu, the full path bug was fixed by you :)
Sorry for everyone. It is a wonder I forget this patch! :)
Thanks for Robin's explanation.
Bingpeng, can you reproduce the problem?
> Hmm, I wonder why .tmp files aren't deleted in the test of Bingpeng...
>
> Thanks,
> Hitoshi
>
>>
>>
>> 2014-09-18 17:28 GMT+08:00 Ruoyu <liangry at ucweb.com>:
>>
>>> On 2014年09月18日 16:30, Bingpeng Zhu wrote:
>>>
>>>  Thank you for the advice.
>>> default_init() of sheep/store.c  has already had the logic of unlinking oid.tmp
>>> files. I'm not sure the reason why oid.tmp file still exists in the
>>> system.
>>>
>>> No. current logic does not unlink oid.tmp files in default_init().
>>>
>>>
>>>
>>>
>>>  ------------------ Original ------------------
>>>  *From: * "Hitoshi Mitake";<mitake.hitoshi at lab.ntt.co.jp>
>>> <mitake.hitoshi at lab.ntt.co.jp>;
>>> *Date: * Sep 18, 2014
>>> *To: * "Ruoyu"<liangry at ucweb.com> <liangry at ucweb.com>;
>>> *Cc: * "Bingpeng Zhu"<nkuzbp at foxmail.com> <nkuzbp at foxmail.com>; "sheepdog"
>>> <sheepdog at lists.wpkg.org> <sheepdog at lists.wpkg.org>;
>>> *Subject: * Re: [sheepdog] question about replica recovery failure caused
>>> by oid.tmp file
>>>
>>>  At Tue, 16 Sep 2014 10:10:32 +0800,
>>> Ruoyu wrote:
>>>> [1  <multipart/alternative (7bit)>]
>>>> [1.1  <text/plain; ISO-8859-1 (7bit)>]
>>>> Thanks Bingpeng.
>>>> I also encountered this problem.
>>>> I suggest sheep should scan oid.tmp files and remove them when it is
>>>> being started.
>>> I agree with Ruoyu's opinion. .tmp files should be deleted at
>>> initialization time. e.g. default_init() of sheep/store.c would be a
>>> good place for it.
>>>
>>> Thanks,
>>> Hitoshi
>>>
>>>> On 2014?09?15? 00:14, Bingpeng Zhu wrote:
>>>>> Hi, all:
>>>>>      I have a problem in using sheepdog. I create a erasure coded VDI
>>>>> and write
>>>>>   some data to it. Then, I unplug disk and stop/restart one sheep in a
>>>>> short
>>>>>   time. After recovery is completed in the latest epoch, I find some
>>>>> replica is
>>>>>   lost and only the corresponding oid.tmp file exists in the data
>>>>> directory. I tried
>>>>>   to rebuild the replica using "dog vdi check", but it didn't work. I
>>>>> think it is
>>>>>   caused by oid.tmp file. I have to delete the oid.tmp file manually
>>>>> and then
>>>>>   "dog vdi check" successfully recoverd the lost replica.
>>>>>       In function default_create_and_write() of sheep/plain_store.c,
>>>>> it returns
>>>>>   success directly if oid.tmp file exists. I have read the comment in
>>>>> this function carefully,
>>>>>   it says gateway and recovery thread may try to write the SAME data,
>>>>> so it is okay
>>>>>   to simply return success here. To solve this problem, I want to
>>>>> change the code of
>>>>> default_create_and_write() so that replica data will be written even
>>>>> oid.tmp file exists.
>>>>>   If oid.tmp exists, the function should overwrite it.
>>>>> I am not sure if this change will work good for all scenario.
>>>>> Especially, I doubt whether
>>>>>   this change will lead to old data overwriting new data. But I
>>>>> haven't thought out any scenario
>>>>>   that will lead to old data overwriting new data. Can someone give me
>>>>> some advice to solve this problem?
>>>>>
>>>>>
>>>>>
>>>> [1.2  <text/html; ISO-8859-1 (7bit)>]
>>>>
>>>> [2  <text/plain; us-ascii (7bit)>]
>>>> --
>>>> sheepdog mailing list
>>>> sheepdog at lists.wpkg.org
>>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>>
>>>
>>> --
>>> sheepdog mailing list
>>> sheepdog at lists.wpkg.org
>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>>
>>>
>>
>> -- 
>> --
>> Best Regard
>> Robin Dong
>> [1.2  <text/html; UTF-8 (quoted-printable)>]
>>
>> [2  <text/plain; us-ascii (7bit)>]
>> -- 
>> sheepdog mailing list
>> sheepdog at lists.wpkg.org
>> http://lists.wpkg.org/mailman/listinfo/sheepdog





More information about the sheepdog mailing list