[sheepdog] [PATCH 2/2] sheep: remove timeout for socket pool

Thu Jun 7 17:50:16 CEST 2012

At Thu, 07 Jun 2012 23:25:39 +0800,
Liu Yuan wrote:
> 
> On 06/07/2012 11:20 PM, Liu Yuan wrote:
> 
> > On 06/07/2012 11:07 PM, MORITA Kazutaka wrote:
> > 
> >> 5 seconds is actually too short, but is it really good to remove
> >> timeout completely?  Without timeout, how long does send/recv/poll
> >> block when network error happens, and how long do guest OSes wait for
> >> read/write/flush to return?
> > 
> > 
> > How about set SO_KEEPALIVE for the connection? Let kernel handle the
> > heart-beat message seems to work.
> > 
> 
> 
> But we already have a membership backend to send heart-beat message to
> all nodes, that is why I originally planed to remove timeout completely,
> looks to me that we don't actually need any timeout mechanism again.
> Previous code need this timeout because IO blocks confchg, but now we
> don't have this constraint, so maybe just removing it would be clean and
> good enough.

The reason we use timeout for socket connections is that, when
membership change happens, the gateway should retry I/Os with a new
membership instead of sleeping long time in forward_read/write_obj_req
with an old membership.  If send/recv/poll blocks for a long time in
the gateway node, timeout happens in the guest OSes, which is what we
really want to avoid.

Thanks,

Kazutaka