At Mon, 04 Jun 2012 16:15:48 +0800, Liu Yuan wrote: > > On 06/04/2012 04:07 PM, MORITA Kazutaka wrote: > > >> I am not 100% about this issue. It is from the experience from > >> > development of sheepfs, when I use a single FD to read/write. Since FUSE > >> > will issue highly concurrent requests, I noticed the same error as above > >> > example: the error code is quite random (see above is '54014b01'). > >> > > >> > After a long time debugging, I came to a conclusion that the problem > >> > *might* be: > >> > > >> > The subsequent read/write requests interleaves with the previous one, > >> > and wrongly read the response. > > I think we should reveal how they interleave before working out how to > > fix. > > > > The current fd cache seems to allow multiple accesses to the same node > > because cached_fds is a thread-local variable and there is no fd which > > is used by multiple threads at the same time. > > > Ah, yes, it is thread local. Then I have no idea how the ret value could > be random, I don't find a reliable way to reproduce this problem. One possibility is that if forward_write_obj_req() fails before receiving data, the next forward_(read|write)_obj_req() could be interleaved. The below untested patch may fix the problem though the approach is a poor way. diff --git a/sheep/gateway.c b/sheep/gateway.c index d287d0c..a8e090e 100644 --- a/sheep/gateway.c +++ b/sheep/gateway.c @@ -124,7 +124,7 @@ int forward_write_obj_req(struct request *req) if (fd < 0) { eprintf("failed to connect to %s:%"PRIu32"\n", name, v->port); ret = SD_RES_NETWORK_ERROR; - goto out; + goto err; } ret = send_req(fd, &fwd_hdr, req->data, &wlen); @@ -132,7 +132,7 @@ int forward_write_obj_req(struct request *req) del_sheep_fd(fd); ret = SD_RES_NETWORK_ERROR; dprintf("fail %"PRIu32"\n", ret); - goto out; + goto err; } pfds[nr_fds].fd = fd; @@ -151,7 +151,8 @@ int forward_write_obj_req(struct request *req) if (rsp->result != SD_RES_SUCCESS) { eprintf("fail %"PRIu32"\n", ret); - goto out; + ret = rsp->result; + goto err; } } @@ -212,6 +213,10 @@ again: } out: return ret; +err: + for (i = 0; i < nr_fds; i++) + del_sheep_fd(pfds[i].fd); + return ret; } static int fix_object_consistency(struct request *req) |