[sheepdog] [PATCH] object cache: fix a race problem

Liu Yuan namei.unix at gmail.com
Sun Jun 3 13:49:40 CEST 2012


From: Liu Yuan <tailai.ly at taobao.com>

Fix the following problem:
...
Jun 03 18:39:53 do_local_io(52) 2, ac1a3e000019b7 , 1
Jun 03 18:39:53 object_cache_pull(529) oid ac1a3e000019b7pulled successfully
Jun 03 18:39:53 object_cache_pull(529) oid ac1a3e000019b7pulled successfully
Jun 03 18:39:53 create_cache_object(451) 000019b7 already created
Jun 03 18:39:53 object_cache_rw(415) 000019b7, len 4096, off 1048576
Jun 03 18:39:53 read_cache_object(396) size 0, count:4096, offset 1048576 File exists
Jun 03 18:39:53 do_gateway_request(308) failed: 2, ac1a3e000019b7 , 1, 3
Jun 03 18:39:53 gateway_op_done(151) leaving sheepdog cluster
...

The problem is, suppose we have two cloned VM reads the same COW oid:

       A                            B

object_cache_pull() {        object_cache_pull() {
  create_cache_object() {      create_cache_object() {
    open(oid);
                                  open(oid) {
                                    oid_already_opened() {
                                      goto out;
                                    }
                                  }
                               }
                             }
                             read_cache_object() {
                               read_size != requested_length;
                               return EIO;
                             }
    wirte(oid);
  }
}

The fix looks more a workaround, I will happy to see a better fix.

Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
 sheep/object_cache.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/sheep/object_cache.c b/sheep/object_cache.c
index e091930..05e50d5 100644
--- a/sheep/object_cache.c
+++ b/sheep/object_cache.c
@@ -448,9 +448,17 @@ static int create_cache_object(struct object_cache *oc, uint32_t idx, void *buff
 	fd = open(buf.buf, flags, def_fmode);
 	if (fd < 0) {
 		if (errno == EEXIST) {
+			struct stat st;
+			fstat(fd, &st);
+			/* Wait for file to be written by pull worker */
+			while (!st.st_size) {
+				pthread_yield();
+				fstat(fd, &st);
+			}
 			dprintf("%08"PRIx32" already created\n", idx);
 			goto out;
 		}
+		dprintf("%m\n");
 		ret = SD_RES_EIO;
 		goto out;
 	}
@@ -526,7 +534,7 @@ static int object_cache_pull(struct vnode_info *vnodes, struct object_cache *oc,
 	ret = forward_read_obj_req(&read_req);
 
 	if (ret == SD_RES_SUCCESS) {
-		dprintf("oid %"PRIx64"pulled successfully\n", oid);
+		dprintf("oid %"PRIx64" pulled successfully\n", oid);
 		ret = create_cache_object(oc, idx, buf, data_length);
 	}
 	free(buf);
-- 
1.7.10.2




More information about the sheepdog mailing list