[sheepdog] question about replica recovery failure caused by oid.tmp file
Bingpeng Zhu
nkuzbp at foxmail.com
Sun Sep 14 18:14:28 CEST 2014
Hi, all:
I have a problem in using sheepdog. I create a erasure coded VDI and write
some data to it. Then, I unplug disk and stop/restart one sheep in a short
time. After recovery is completed in the latest epoch, I find some replica is
lost and only the corresponding oid.tmp file exists in the data directory. I tried
to rebuild the replica using "dog vdi check", but it didn't work. I think it is
caused by oid.tmp file. I have to delete the oid.tmp file manually and then
"dog vdi check" successfully recoverd the lost replica.
In function default_create_and_write() of sheep/plain_store.c, it returns
success directly if oid.tmp file exists. I have read the comment in this function carefully,
it says gateway and recovery thread may try to write the SAME data, so it is okay
to simply return success here. To solve this problem, I want to change the code of
default_create_and_write() so that replica data will be written even oid.tmp file exists.
If oid.tmp exists, the function should overwrite it.
I am not sure if this change will work good for all scenario. Especially, I doubt whether
this change will lead to old data overwriting new data. But I haven't thought out any scenario
that will lead to old data overwriting new data. Can someone give me some advice to solve this problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140915/0d7f78c2/attachment-0003.html>
More information about the sheepdog
mailing list