[sheepdog] [RFC PATCH] object cache: revert object_cache_pull() to older version

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Jun 4 08:52:33 CEST 2012


At Mon, 04 Jun 2012 14:12:10 +0800,
Liu Yuan wrote:
> 
> On 06/04/2012 02:04 PM, Liu Yuan wrote:
> 
> > The current object_cache_pull() cause bellow bug:
> > ...
> > do_gateway_request(288) 2, 80d6d76e00000000 , 1
> > Jun 04 10:16:37 connect_to(241) 2126, 10.232.134.3:7000
> > Jun 04 10:16:37 client_handler(747) closed connection 2116
> > Jun 04 10:16:37 destroy_client(678) connection from: 127.0.0.1:60214
> > Jun 04 10:16:37 listen_handler(797) accepted a new connection: 2116
> > Jun 04 10:16:37 client_rx_handler(586) connection from: 127.0.0.1:60216
> > Jun 04 10:16:37 queue_request(385) 2
> > Jun 04 10:16:37 do_gateway_request(288) 2, 80d6d76e00000000 , 1
> > Jun 04 10:16:37 do_gateway_request(308) failed: 2, 80d6d76e00000000 , 1, 54014b01
> > ...
> > 
> > This is because we use forward_read_obj_req(), which tries to multiplex a socket
> > FD if concurrent requests access to the same object and unforunately routed to
> > the same node.
> > 
> > Object cache has a very high pressure of current requests access to the same
> > COW object from cloned VMs, so this problem emerges. It looks to me that,
> > besides object cache, QEMU requests are  also be subject to this problem too
> > because QEMU's sheepdog block layer can issue multiple requests in one go.
> 
> 
> The alternative fix is to write a new fd cache, which allow mutiple FDs
> to the same node. This looks a better fix that sort out all the related
> problems

Can you explain how the current fd cache causes the above problem
against the concurrent accesses to the same node in more detail?

Thanks,

Kazutaka



More information about the sheepdog mailing list