[sheepdog] question about replica recovery failure caused by oid.tmp file

Bingpeng Zhu nkuzbp at foxmail.com
Sun Sep 14 18:14:28 CEST 2014


Hi, all:
     I have a problem in using sheepdog. I create a erasure coded VDI and write
  some data to it. Then, I unplug disk and stop/restart one sheep in a short
  time. After recovery is completed in the latest epoch, I find some replica is
  lost and only the corresponding oid.tmp file exists in the data directory. I tried
  to rebuild the replica using "dog vdi check", but it didn't work. I think it is
  caused by oid.tmp file. I have to delete the oid.tmp file manually and then
  "dog vdi check" successfully recoverd the lost replica.
      In function default_create_and_write() of sheep/plain_store.c, it returns
  success directly if oid.tmp file exists. I have read the comment in this function carefully,
  it says gateway and recovery thread may try to write the SAME data, so it is okay
  to simply return success here. To solve this problem, I want to change the code of
  default_create_and_write() so that replica data will be written even oid.tmp file exists.
  If oid.tmp exists, the function should overwrite it.

     I am not sure if this change will work good for all scenario. Especially, I doubt whether
  this change will lead to old data overwriting new data. But I haven't thought out any scenario
  that will lead to old data overwriting new data. Can someone give me some advice to solve this problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140915/0d7f78c2/attachment-0003.html>


More information about the sheepdog mailing list