[sheepdog] [PATCH 1/2] test: add a test for sockfd keepalive

Mon Sep 3 16:11:36 CEST 2012

At Mon, 03 Sep 2012 23:07:34 +0900,
MORITA Kazutaka wrote:
> 
> At Mon, 03 Sep 2012 21:30:09 +0800,
> Liu Yuan wrote:
> > 
> > On 09/03/2012 08:24 PM, MORITA Kazutaka wrote:
> > > No.  The reason I doubt keepalive is that, when the trouble happens,
> > > the scripts takes 15 minutes always.  I just guess the connection is
> > > closed with another timeout, but I'm not sure.  So, I wrote 'perhaps'.
> > > 
> > >> > 
> > >> > I am not sure, but I think current keepalive implementation looks okay to me, it is simple
> > >> > and efficient. I have tested with various situation besides this script. If there is any
> > >> > problem inside the code, I'd like to fix the bug instead of running away completely from it.
> > > Okay, but in future, it would be considerable to remove TCP keepalive.
> > > The check of node availability is the work of cluster driver.
> > 
> > All the hangs is suspected to use RTO instead of keepalive timer. Could you please tell me where
> > the thread is hung at? 
> 
> It waits for a response from the unreachable node at poll() in
> wait_forward_request().  I'm not sure why it returns after keepalive

s/it returns/it doesn't return/

Thanks,

Kazutaka