[sheepdog] [PATCH] sheep: open files with O_EXCL when creating objects

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Sep 13 11:30:02 CEST 2012


There is a race condition in default_create_and_write_obj because if
node membership changes during object creation, gateway and recovery
process could send CREATE operations to the same object at the same
time.  This causes, for example, the following problem.

  1. gateway request creates a tmp_path file
  2. recovery process also creates the same tmp_path file
  3. gateway request renames the tmp_path to a correct path
  4. recovery process also tries to rename, but it fails because the
     tmp_path doesn't exist

To avoid this problem, this patch uses a O_EXCL option to create
objects.

Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
---
 sheep/plain_store.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/sheep/plain_store.c b/sheep/plain_store.c
index 0c543ef..5962ac2 100644
--- a/sheep/plain_store.c
+++ b/sheep/plain_store.c
@@ -28,7 +28,7 @@ static int get_open_flags(uint64_t oid, bool create)
 		flags |= O_DIRECT;
 
 	if (create)
-		flags |= O_CREAT | O_TRUNC;
+		flags |= O_CREAT | O_EXCL;
 
 	return flags;
 }
@@ -282,6 +282,15 @@ int default_create_and_write(uint64_t oid, struct siocb *iocb)
 
 	fd = open(tmp_path, flags, def_fmode);
 	if (fd < 0) {
+		if (errno == EEXIST)
+			/* This happens if node membership changes during object
+			 * creation; while gateway retries a CREATE request,
+			 * recovery process could also recover the object at the
+			 * same time.  They should try to write the same date,
+			 * so it is okay to simply return success here. */
+			dprintf("%s exists\n", tmp_path);
+			return SD_RES_SUCCESS;
+
 		eprintf("failed to open %s: %m\n", tmp_path);
 		return SD_RES_EIO;
 	}
-- 
1.7.2.5




More information about the sheepdog mailing list