[sheepdog-users] [PATCH stable-0.8 17/22] sheep: fix error in sheepdog cluster recovery

Mon Feb 24 08:07:05 CET 2014

From: Robin Dong <sanbai at taobao.com>

Sheepdog failed to recover object when we running it on 5 servers cluster with
about 20G data by erasure-code mode.

The reason is in default_create_and_write(): it rename() obj to data-directory
and then set xattr of ec-index for it, this will leave a time-window for another
process to read the data-object but can't get xattr of ec-index. Then the
process will report get-xattr fail and remove the disk as it think it's an
io-error event.

Signed-off-by: Robin Dong <sanbai at taobao.com>
Signed-off-by: Liu Yuan <namei.unix at gmail.com>
---
 sheep/plain_store.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/sheep/plain_store.c b/sheep/plain_store.c
index 9a4871c..c3c6eaf 100644
--- a/sheep/plain_store.c
+++ b/sheep/plain_store.c
@@ -403,16 +403,18 @@ int default_create_and_write(uint64_t oid, const struct siocb *iocb)
 		goto out;
 	}
 
+	if (ec && set_erasure_index(tmp_path, iocb->ec_index) < 0) {
+		ret = err_to_sderr(tmp_path, oid, errno);
+		goto out;
+	}
+
 	ret = rename(tmp_path, path);
 	if (ret < 0) {
 		sd_err("failed to rename %s to %s: %m", tmp_path, path);
 		ret = err_to_sderr(path, oid, errno);
 		goto out;
 	}
-	if (ec && set_erasure_index(path, iocb->ec_index) < 0) {
-		ret = err_to_sderr(path, oid, errno);
-		goto out;
-	}
+
 	ret = SD_RES_SUCCESS;
 	objlist_cache_insert(oid);
 out:
-- 
1.7.10.4