[sheepdog] question about replica recovery failure caused by oid.tmp file

Hitoshi Mitake mitake.hitoshi at lab.ntt.co.jp
Fri Sep 19 04:40:57 CEST 2014


At Fri, 19 Sep 2014 10:28:19 +0800,
Robin Dong wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; UTF-8 (quoted-printable)>]
> default_init() --> for_each_object_in_wd() --> thread_process_path()
> --> for_each_object_in_path() and the code for for_each_object_in_path():
> 
> ......
>                 if (is_tmp_dentry(d->d_name)) {
>                         if (cleanup) {
>                                 snprintf(file_name, sizeof(file_name),
>                                                 "%s/%s", path, d->d_name);
>                                 sd_debug("remove tmp object %s", file_name);
>                                 if (unlink(file_name) < 0)
>                                         sd_err("failed to unlink %s: %m",
>                                                         file_name);
>                         }
>                         continue;
>                 }
> ......

Oops! Thanks for your explanation, Robin. I misunderstood the code,
sorry Alibaba guys!

> 
> sheep should remove .tmp file when it start, or it will not work correctly.
> Ruoyu, the full path bug was fixed by you :)

Hmm, I wonder why .tmp files aren't deleted in the test of Bingpeng...

Thanks,
Hitoshi

> 
> 
> 
> 2014-09-18 17:28 GMT+08:00 Ruoyu <liangry at ucweb.com>:
> 
> >
> > On 2014年09月18日 16:30, Bingpeng Zhu wrote:
> >
> >  Thank you for the advice.
> > default_init() of sheep/store.c  has already had the logic of unlinking oid.tmp
> > files. I'm not sure the reason why oid.tmp file still exists in the
> > system.
> >
> > No. current logic does not unlink oid.tmp files in default_init().
> >
> >
> >
> >
> >  ------------------ Original ------------------
> >  *From: * "Hitoshi Mitake";<mitake.hitoshi at lab.ntt.co.jp>
> > <mitake.hitoshi at lab.ntt.co.jp>;
> > *Date: * Sep 18, 2014
> > *To: * "Ruoyu"<liangry at ucweb.com> <liangry at ucweb.com>;
> > *Cc: * "Bingpeng Zhu"<nkuzbp at foxmail.com> <nkuzbp at foxmail.com>; "sheepdog"
> > <sheepdog at lists.wpkg.org> <sheepdog at lists.wpkg.org>;
> > *Subject: * Re: [sheepdog] question about replica recovery failure caused
> > by oid.tmp file
> >
> >  At Tue, 16 Sep 2014 10:10:32 +0800,
> > Ruoyu wrote:
> > >
> > > [1  <multipart/alternative (7bit)>]
> > > [1.1  <text/plain; ISO-8859-1 (7bit)>]
> > > Thanks Bingpeng.
> > > I also encountered this problem.
> > > I suggest sheep should scan oid.tmp files and remove them when it is
> > > being started.
> >
> > I agree with Ruoyu's opinion. .tmp files should be deleted at
> > initialization time. e.g. default_init() of sheep/store.c would be a
> > good place for it.
> >
> > Thanks,
> > Hitoshi
> >
> > >
> > > On 2014?09?15? 00:14, Bingpeng Zhu wrote:
> > > > Hi, all:
> > > >      I have a problem in using sheepdog. I create a erasure coded VDI
> > > > and write
> > > >   some data to it. Then, I unplug disk and stop/restart one sheep in a
> > > > short
> > > >   time. After recovery is completed in the latest epoch, I find some
> > > > replica is
> > > >   lost and only the corresponding oid.tmp file exists in the data
> > > > directory. I tried
> > > >   to rebuild the replica using "dog vdi check", but it didn't work. I
> > > > think it is
> > > >   caused by oid.tmp file. I have to delete the oid.tmp file manually
> > > > and then
> > > >   "dog vdi check" successfully recoverd the lost replica.
> > > >       In function default_create_and_write() of sheep/plain_store.c,
> > > > it returns
> > > >   success directly if oid.tmp file exists. I have read the comment in
> > > > this function carefully,
> > > >   it says gateway and recovery thread may try to write the SAME data,
> > > > so it is okay
> > > >   to simply return success here. To solve this problem, I want to
> > > > change the code of
> > > > default_create_and_write() so that replica data will be written even
> > > > oid.tmp file exists.
> > > >   If oid.tmp exists, the function should overwrite it.
> > > > I am not sure if this change will work good for all scenario.
> > > > Especially, I doubt whether
> > > >   this change will lead to old data overwriting new data. But I
> > > > haven't thought out any scenario
> > > >   that will lead to old data overwriting new data. Can someone give me
> > > > some advice to solve this problem?
> > > >
> > > >
> > > >
> > >
> > > [1.2  <text/html; ISO-8859-1 (7bit)>]
> > >
> > > [2  <text/plain; us-ascii (7bit)>]
> > > --
> > > sheepdog mailing list
> > > sheepdog at lists.wpkg.org
> > > http://lists.wpkg.org/mailman/listinfo/sheepdog
> >
> >
> >
> > --
> > sheepdog mailing list
> > sheepdog at lists.wpkg.org
> > http://lists.wpkg.org/mailman/listinfo/sheepdog
> >
> >
> 
> 
> -- 
> --
> Best Regard
> Robin Dong
> [1.2  <text/html; UTF-8 (quoted-printable)>]
> 
> [2  <text/plain; us-ascii (7bit)>]
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list