[sheepdog] question about replica recovery failure caused by oid.tmp file

Ruoyu liangry at ucweb.com
Tue Sep 16 04:10:32 CEST 2014


Thanks Bingpeng.
I also encountered this problem.
I suggest sheep should scan oid.tmp files and remove them when it is 
being started.

On 2014?09?15? 00:14, Bingpeng Zhu wrote:
> Hi, all:
>      I have a problem in using sheepdog. I create a erasure coded VDI 
> and write
>   some data to it. Then, I unplug disk and stop/restart one sheep in a 
> short
>   time. After recovery is completed in the latest epoch, I find some 
> replica is
>   lost and only the corresponding oid.tmp file exists in the data 
> directory. I tried
>   to rebuild the replica using "dog vdi check", but it didn't work. I 
> think it is
>   caused by oid.tmp file. I have to delete the oid.tmp file manually 
> and then
>   "dog vdi check" successfully recoverd the lost replica.
>       In function default_create_and_write() of sheep/plain_store.c, 
> it returns
>   success directly if oid.tmp file exists. I have read the comment in 
> this function carefully,
>   it says gateway and recovery thread may try to write the SAME data, 
> so it is okay
>   to simply return success here. To solve this problem, I want to 
> change the code of
> default_create_and_write() so that replica data will be written even 
> oid.tmp file exists.
>   If oid.tmp exists, the function should overwrite it.
> I am not sure if this change will work good for all scenario. 
> Especially, I doubt whether
>   this change will lead to old data overwriting new data. But I 
> haven't thought out any scenario
>   that will lead to old data overwriting new data. Can someone give me 
> some advice to solve this problem?
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140916/daf10403/attachment-0004.html>


More information about the sheepdog mailing list