[sheepdog] [PATCH 1/2] test: add a test for sockfd keepalive

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Mon Sep 3 14:24:27 CEST 2012


At Mon, 03 Sep 2012 19:54:22 +0800,
Liu Yuan wrote:
> > 
> > I found that this script takes a lot of time (about 15 minutes)
> > occasionally.  Perhaps, TCP keepalive is not working in some
> > situations?  This problem is highly reproducible on my environment
> > with the following script.
> > 
> >  $ while test "$?" -eq 0; do ./check 35 -corosync; done
> 
> I'll try this script, thanks. But I'd remind that sometimes the problem is caused
> by scripts itself. Could you see the log and be sure that it is the code that cause
> the problem?

No.  The reason I doubt keepalive is that, when the trouble happens,
the scripts takes 15 minutes always.  I just guess the connection is
closed with another timeout, but I'm not sure.  So, I wrote 'perhaps'.

> 
> I am not sure, but I think current keepalive implementation looks okay to me, it is simple
> and efficient. I have tested with various situation besides this script. If there is any
> problem inside the code, I'd like to fix the bug instead of running away completely from it.

Okay, but in future, it would be considerable to remove TCP keepalive.
The check of node availability is the work of cluster driver.

Thanks,

Kazutaka



More information about the sheepdog mailing list