[sheepdog] [RFC PATCH] object cache: revert object_cache_pull() to older version
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Mon Jun 4 10:53:24 CEST 2012
At Mon, 04 Jun 2012 16:15:48 +0800,
Liu Yuan wrote:
>
> On 06/04/2012 04:07 PM, MORITA Kazutaka wrote:
>
> >> I am not 100% about this issue. It is from the experience from
> >> > development of sheepfs, when I use a single FD to read/write. Since FUSE
> >> > will issue highly concurrent requests, I noticed the same error as above
> >> > example: the error code is quite random (see above is '54014b01').
> >> >
> >> > After a long time debugging, I came to a conclusion that the problem
> >> > *might* be:
> >> >
> >> > The subsequent read/write requests interleaves with the previous one,
> >> > and wrongly read the response.
> > I think we should reveal how they interleave before working out how to
> > fix.
> >
> > The current fd cache seems to allow multiple accesses to the same node
> > because cached_fds is a thread-local variable and there is no fd which
> > is used by multiple threads at the same time.
>
>
> Ah, yes, it is thread local. Then I have no idea how the ret value could
> be random, I don't find a reliable way to reproduce this problem.
One possibility is that if forward_write_obj_req() fails before
receiving data, the next forward_(read|write)_obj_req() could be
interleaved.
The below untested patch may fix the problem though the approach is a
poor way.
diff --git a/sheep/gateway.c b/sheep/gateway.c
index d287d0c..a8e090e 100644
--- a/sheep/gateway.c
+++ b/sheep/gateway.c
@@ -124,7 +124,7 @@ int forward_write_obj_req(struct request *req)
if (fd < 0) {
eprintf("failed to connect to %s:%"PRIu32"\n", name, v->port);
ret = SD_RES_NETWORK_ERROR;
- goto out;
+ goto err;
}
ret = send_req(fd, &fwd_hdr, req->data, &wlen);
@@ -132,7 +132,7 @@ int forward_write_obj_req(struct request *req)
del_sheep_fd(fd);
ret = SD_RES_NETWORK_ERROR;
dprintf("fail %"PRIu32"\n", ret);
- goto out;
+ goto err;
}
pfds[nr_fds].fd = fd;
@@ -151,7 +151,8 @@ int forward_write_obj_req(struct request *req)
if (rsp->result != SD_RES_SUCCESS) {
eprintf("fail %"PRIu32"\n", ret);
- goto out;
+ ret = rsp->result;
+ goto err;
}
}
@@ -212,6 +213,10 @@ again:
}
out:
return ret;
+err:
+ for (i = 0; i < nr_fds; i++)
+ del_sheep_fd(pfds[i].fd);
+ return ret;
}
static int fix_object_consistency(struct request *req)
More information about the sheepdog
mailing list