[sheepdog] fix embryonic connection

Tue Sep 4 09:43:56 CEST 2012

At Tue, 04 Sep 2012 15:00:56 +0800,
Liu Yuan wrote:
> 
> On 09/04/2012 12:38 AM, Liu Yuan wrote:
> > ESTAB      0      52                                     127.0.0.1:48339                                 127.0.0.1:7001
> > timer:(on,49sec,12) users:(("sheep",4961,16)) ino:855713 sk:ffff88007086de80 ts sack cubic wscale:7,7 rto:120000 rtt:18.75/7.5 ato:40 ssthresh:7 send 14.0Mbps rcv_space:32792
> 
> I guess I found the problem why keepalive doesn't take any effect. There
> is 52 bytes(our header!) in send buffer not acknowledged by the remote
> host, then the RTO timer is fired on, which mute keepalive timer.
> 
> keepalive has the following benefit:
> 
>    when the remote node is just busy(for e.g, with doing disk IO) not
> really down, poll might false timeout and we'll close all the valid fds
> of that node.
> 
> Actually, if data is already sent to the remote node, when remote node
> crashes, the poll & keepalive works well.
> 
> So basically there are two fixing out there:
>  1 add timeout for poll (pros: simply, cons: false timeout)
>  2 add timeout for poll only if we detect that send buffer is not emapy.
> If not, do with keepalive. ( pros: efficient, cons: more complex code)
> 
> I prefer 2, because most of times (for normal running time), we dont'
> have this corner case.
> 
> What do you think? Kazutaka.

False timeout is not a problem if we simply retry poll when epoch is
not incremented?  If it is true, I prefer the first approach.  I guess
we can handle timeout simply and efficiently like this way without TCP
keepalive.

I wonder if it is the right way to go to use TCP keepalive.  To be
honest, I'm not familiar with TCP keepalive well.  Does it really work
with hundreds of nodes or under heavy load?  I'm afraid there are
other problems like this.

Thanks,

Kazutaka