[sheepdog] question about replica recovery failure caused by oid.tmp file

Ruoyu liangry at ucweb.com
Thu Sep 18 11:28:27 CEST 2014


On 2014?09?18? 16:30, Bingpeng Zhu wrote:
> Thank you for the advice.
> default_init() of sheep/store.chas already had the logic of unlinking 
> oid.tmp files. I'm not sure the reason why oid.tmp file still exists 
> in the system.
No. current logic does not unlink oid.tmp files in default_init().
>
>
>
> ------------------ Original ------------------
> *From: * "Hitoshi Mitake";<mitake.hitoshi at lab.ntt.co.jp>;
> *Date: * Sep 18, 2014
> *To: * "Ruoyu"<liangry at ucweb.com>;
> *Cc: * "Bingpeng Zhu"<nkuzbp at foxmail.com>; 
> "sheepdog"<sheepdog at lists.wpkg.org>;
> *Subject: * Re: [sheepdog] question about replica recovery failure 
> caused by oid.tmp file
>
> At Tue, 16 Sep 2014 10:10:32 +0800,
> Ruoyu wrote:
> >
> > [1  <multipart/alternative (7bit)>]
> > [1.1  <text/plain; ISO-8859-1 (7bit)>]
> > Thanks Bingpeng.
> > I also encountered this problem.
> > I suggest sheep should scan oid.tmp files and remove them when it is
> > being started.
>
> I agree with Ruoyu's opinion. .tmp files should be deleted at
> initialization time. e.g. default_init() of sheep/store.c would be a
> good place for it.
>
> Thanks,
> Hitoshi
>
> >
> > On 2014?09?15? 00:14, Bingpeng Zhu wrote:
> > > Hi, all:
> > >      I have a problem in using sheepdog. I create a erasure coded VDI
> > > and write
> > >   some data to it. Then, I unplug disk and stop/restart one sheep 
> in a
> > > short
> > >   time. After recovery is completed in the latest epoch, I find some
> > > replica is
> > >   lost and only the corresponding oid.tmp file exists in the data
> > > directory. I tried
> > >   to rebuild the replica using "dog vdi check", but it didn't work. I
> > > think it is
> > >   caused by oid.tmp file. I have to delete the oid.tmp file manually
> > > and then
> > >   "dog vdi check" successfully recoverd the lost replica.
> > >       In function default_create_and_write() of sheep/plain_store.c,
> > > it returns
> > >   success directly if oid.tmp file exists. I have read the comment in
> > > this function carefully,
> > >   it says gateway and recovery thread may try to write the SAME data,
> > > so it is okay
> > >   to simply return success here. To solve this problem, I want to
> > > change the code of
> > > default_create_and_write() so that replica data will be written even
> > > oid.tmp file exists.
> > >   If oid.tmp exists, the function should overwrite it.
> > > I am not sure if this change will work good for all scenario.
> > > Especially, I doubt whether
> > >   this change will lead to old data overwriting new data. But I
> > > haven't thought out any scenario
> > >   that will lead to old data overwriting new data. Can someone 
> give me
> > > some advice to solve this problem?
> > >
> > >
> > >
> >
> > [1.2  <text/html; ISO-8859-1 (7bit)>]
> >
> > [2  <text/plain; us-ascii (7bit)>]
> > --
> > sheepdog mailing list
> > sheepdog at lists.wpkg.org
> > http://lists.wpkg.org/mailman/listinfo/sheepdog

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140918/61a278ea/attachment-0004.html>


More information about the sheepdog mailing list