[sheepdog] fix embryonic connection

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Sep 4 10:56:38 CEST 2012


At Tue, 04 Sep 2012 16:28:20 +0800,
Liu Yuan wrote:
> 
> On 09/04/2012 03:43 PM, MORITA Kazutaka wrote:
> > False timeout is not a problem if we simply retry poll when epoch is
> > not incremented?  If it is true, I prefer the first approach.  I guess
> > we can handle timeout simply and efficiently like this way without TCP
> > keepalive.
> > 
> > I wonder if it is the right way to go to use TCP keepalive.  To be
> > honest, I'm not familiar with TCP keepalive well.  Does it really work
> > with hundreds of nodes or under heavy load?  I'm afraid there are
> > other problems like this.
> 
> Keepalive message is only sent when there isn't any data transfer in the
> connection, so most of time, the timer won't be fired on. I have
> finished the patch with approach 2 and it works well now. I am going to
> post the patch set.

If it works well, it's okay to me for now.  But, in future, I'd like
to consider removing TCP keepalive.  If the target node is in the
sheepdog node list, sheep should keep the connection even if timeout
is fired.  If the target is not in the node list, sheep should close
the connection ASAP.  The check of availability should be done by
cluster drivers.  That's the reason I suggested "retry poll while
epoch is not updated".

Thanks,

Kazutaka



More information about the sheepdog mailing list