[sheepdog] [RFC PATCH] object cache: revert object_cache_pull() to older version

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Jun 4 10:53:24 CEST 2012


At Mon, 04 Jun 2012 16:15:48 +0800,
Liu Yuan wrote:
> 
> On 06/04/2012 04:07 PM, MORITA Kazutaka wrote:
> 
> >> I am not 100% about this issue. It is from the experience from
> >> > development of sheepfs, when I use a single FD to read/write. Since FUSE
> >> > will issue highly concurrent requests, I noticed the same error as above
> >> > example: the error code is quite random (see above is '54014b01').
> >> > 
> >> > After a long time debugging, I came to a conclusion that the problem
> >> > *might* be:
> >> > 
> >> > The subsequent read/write requests interleaves with the previous one,
> >> > and wrongly read the response.
> > I think we should reveal how they interleave before working out how to
> > fix.
> > 
> > The current fd cache seems to allow multiple accesses to the same node
> > because cached_fds is a thread-local variable and there is no fd which
> > is used by multiple threads at the same time.
> 
> 
> Ah, yes, it is thread local. Then I have no idea how the ret value could
> be random, I don't find a reliable way to reproduce this problem.

One possibility is that if forward_write_obj_req() fails before
receiving data, the next forward_(read|write)_obj_req() could be
interleaved.

The below untested patch may fix the problem though the approach is a
poor way.

diff --git a/sheep/gateway.c b/sheep/gateway.c
index d287d0c..a8e090e 100644
--- a/sheep/gateway.c
+++ b/sheep/gateway.c
@@ -124,7 +124,7 @@ int forward_write_obj_req(struct request *req)
 		if (fd < 0) {
 			eprintf("failed to connect to %s:%"PRIu32"\n", name, v->port);
 			ret = SD_RES_NETWORK_ERROR;
-			goto out;
+			goto err;
 		}
 
 		ret = send_req(fd, &fwd_hdr, req->data, &wlen);
@@ -132,7 +132,7 @@ int forward_write_obj_req(struct request *req)
 			del_sheep_fd(fd);
 			ret = SD_RES_NETWORK_ERROR;
 			dprintf("fail %"PRIu32"\n", ret);
-			goto out;
+			goto err;
 		}
 
 		pfds[nr_fds].fd = fd;
@@ -151,7 +151,8 @@ int forward_write_obj_req(struct request *req)
 
 		if (rsp->result != SD_RES_SUCCESS) {
 			eprintf("fail %"PRIu32"\n", ret);
-			goto out;
+			ret = rsp->result;
+			goto err;
 		}
 	}
 
@@ -212,6 +213,10 @@ again:
 	}
 out:
 	return ret;
+err:
+	for (i = 0; i < nr_fds; i++)
+		del_sheep_fd(pfds[i].fd);
+	return ret;
 }
 
 static int fix_object_consistency(struct request *req)



More information about the sheepdog mailing list