[sheepdog] [PATCH v1 1/2] sheep: fix error in sheepdog cluster recovery
Robin Dong
robin.k.dong at gmail.com
Thu Feb 13 10:23:06 CET 2014
From: Robin Dong <sanbai at taobao.com>
Sheepdog failed to recover object when we running it on 5 servers cluster with
about 20G data by erasure-code mode.
The reason is in default_create_and_write(): it rename() obj to data-directory
and then set xattr of ec-index for it, this will leave a time-window for another
process to read the data-object but can't get xattr of ec-index. Then the
process will report get-xattr fail and remove the disk as it think it's an
io-error event.
Signed-off-by: Robin Dong <sanbai at taobao.com>
---
sheep/plain_store.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/sheep/plain_store.c b/sheep/plain_store.c
index 90ef0a6..754a25a 100644
--- a/sheep/plain_store.c
+++ b/sheep/plain_store.c
@@ -431,16 +431,18 @@ int default_create_and_write(uint64_t oid, const struct siocb *iocb)
goto out;
}
+ if (ec && set_erasure_index(tmp_path, iocb->ec_index) < 0) {
+ ret = err_to_sderr(tmp_path, oid, errno);
+ goto out;
+ }
+
ret = rename(tmp_path, path);
if (ret < 0) {
sd_err("failed to rename %s to %s: %m", tmp_path, path);
ret = err_to_sderr(path, oid, errno);
goto out;
}
- if (ec && set_erasure_index(path, iocb->ec_index) < 0) {
- ret = err_to_sderr(path, oid, errno);
- goto out;
- }
+
ret = SD_RES_SUCCESS;
objlist_cache_insert(oid);
out:
--
1.7.12.4
More information about the sheepdog
mailing list