[sheepdog-users] .stale directory was removed after cluster format

Tue Dec 16 09:24:18 CET 2014

On Tue, Dec 16, 2014 at 3:38 PM, Saeki Masaki <saeki.masaki at po.ntts.co.jp>
wrote:
>
> (2014/12/16 15:58), Jinzhi Chen wrote:
>
>> in the `cluster_make_fs` function(sheep/ops.c)..
>>
>> sheep first calls sd_store->format(), then calls sd_store->init().
>> It seems that sheep first remove the .stale dir, then create the .stale
>> dir.
>> am I right?
>>
>> then I do some tests about cluster format.
>> first format the cluster  and then check whether .stale dir is created.
>> during my test. sometimes .stale is created, sometimes is not.
>> which is very odd.
>>
>>
>> then I add sync() function between `format()` and `init()`, and it seems
>> it worked. every time cluster format creates .stale dir.
>>
>> -----
>> diff --git a/sheep/ops.c b/sheep/ops.c
>> index 9eb3280..204d586 100644
>> --- a/sheep/ops.c
>> +++ b/sheep/ops.c
>> @@ -272,6 +272,7 @@ static int cluster_make_fs(const struct sd_req *req,
>> struct sd_rsp *rsp,
>>          if (ret != SD_RES_SUCCESS)
>>                  return ret;
>>
>> +       sync();
>>          ret = sd_store->init();
>>          if (ret != SD_RES_SUCCESS)
>>                  return ret;
>>
>>
>>
>> what do you think
>>
>
> I don't have enough knowledge about sync() system call.
> so there may be effective.
>
> however, I think it is not enough.
> I have seen the following message.(https://bugs.launchpad.net/sheepdog-
> project/+bug/1402887)
>
> /var/log/sheep1/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs
> server service is not compiled
> /var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [main]
> make_stale_dir(233) mkdir /var/lib/sheepdog/data1/obj/.stale
> /var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir
> /var/lib/sheepdog/data1/obj/.stale
>
> sd_store->init() and sd_store->format() is executed at main thread
> sequential.
> but rmdir_r executed in worker thread ( name [util] )
>
sorry, i didn't notice that. so i thought that maybe the issue of fs
operation cache.

> so, if worker thread is lazy, it will occurs again.
>
yeah, you are right.
we need to make sure that `[util]rmdir_r` and the `main` thread execute
sequential.
maybe a eventfd for thread communication is needed.

Regards,
> Saeki.
>
>
>
>
>>
>> Thanks,
>> Jinzhi Chen
>>
>> On Tue, Dec 16, 2014 at 10:28 AM, Saeki Masaki <
>> saeki.masaki at po.ntts.co.jp>
>> wrote:
>>
>>>
>>> I posted a bug in launchpad:
>>>    https://bugs.launchpad.net/sheepdog-project/+bug/1402887
>>>
>>> After execute cluster format .stale directory was removed.
>>> I think that is a bad influence on the automatic recovery.
>>>
>>> Regards, Saeki.
>>>
>>> --
>>> sheepdog-users mailing lists
>>> sheepdog-users at lists.wpkg.org
>>> http://lists.wpkg.org/mailman/listinfo/sheepdog-users
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20141216/59b8ff1f/attachment-0005.html>