[sheepdog] [RFC PATCH] object cache: revert object_cache_pull() to older version

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Jun 4 10:56:38 CEST 2012


At Mon, 04 Jun 2012 17:53:24 +0900,
MORITA Kazutaka wrote:
> 
> At Mon, 04 Jun 2012 16:15:48 +0800,
> Liu Yuan wrote:
> > 
> > On 06/04/2012 04:07 PM, MORITA Kazutaka wrote:
> > 
> > >> I am not 100% about this issue. It is from the experience from
> > >> > development of sheepfs, when I use a single FD to read/write. Since FUSE
> > >> > will issue highly concurrent requests, I noticed the same error as above
> > >> > example: the error code is quite random (see above is '54014b01').
> > >> > 
> > >> > After a long time debugging, I came to a conclusion that the problem
> > >> > *might* be:
> > >> > 
> > >> > The subsequent read/write requests interleaves with the previous one,
> > >> > and wrongly read the response.
> > > I think we should reveal how they interleave before working out how to
> > > fix.
> > > 
> > > The current fd cache seems to allow multiple accesses to the same node
> > > because cached_fds is a thread-local variable and there is no fd which
> > > is used by multiple threads at the same time.
> > 
> > 
> > Ah, yes, it is thread local. Then I have no idea how the ret value could
> > be random, I don't find a reliable way to reproduce this problem.
> 
> One possibility is that if forward_write_obj_req() fails before
> receiving data, the next forward_(read|write)_obj_req() could be
> interleaved.

What I meant is that the forward_(read|write)_obj_req() could read the
previous result of forward_write_obj_req().

Thanks,

Kazutaka



More information about the sheepdog mailing list