At Thu, 07 Jun 2012 23:25:39 +0800, Liu Yuan wrote: > > On 06/07/2012 11:20 PM, Liu Yuan wrote: > > > On 06/07/2012 11:07 PM, MORITA Kazutaka wrote: > > > >> 5 seconds is actually too short, but is it really good to remove > >> timeout completely? Without timeout, how long does send/recv/poll > >> block when network error happens, and how long do guest OSes wait for > >> read/write/flush to return? > > > > > > How about set SO_KEEPALIVE for the connection? Let kernel handle the > > heart-beat message seems to work. > > > > > But we already have a membership backend to send heart-beat message to > all nodes, that is why I originally planed to remove timeout completely, > looks to me that we don't actually need any timeout mechanism again. > Previous code need this timeout because IO blocks confchg, but now we > don't have this constraint, so maybe just removing it would be clean and > good enough. The reason we use timeout for socket connections is that, when membership change happens, the gateway should retry I/Os with a new membership instead of sleeping long time in forward_read/write_obj_req with an old membership. If send/recv/poll blocks for a long time in the gateway node, timeout happens in the guest OSes, which is what we really want to avoid. Thanks, Kazutaka |