On 06/08/2012 12:06 AM, MORITA Kazutaka wrote: > At Fri, 08 Jun 2012 00:01:20 +0800, > Liu Yuan wrote: >> >> On 06/07/2012 11:50 PM, MORITA Kazutaka wrote: >> >>> The reason we use timeout for socket connections is that, when >>> membership change happens, the gateway should retry I/Os with a new >>> membership instead of sleeping long time in forward_read/write_obj_req >>> with an old membership. If send/recv/poll blocks for a long time in >>> the gateway node, timeout happens in the guest OSes, which is what we >>> really want to avoid. >> >> >> So there is a dilemma: if not long enough, we will cancel a valid >> connection which the other end is just busy. >> >> I am considering another approach that let recovery thread to kill those >> blocking connection instead of timeout. How about it? > > Looks a good approach to me :) > I have tried the 'kill' idea, but found it rather difficult than necessary, so I switched to keepalive, which is considerably simpler. I'll draft a new patch based on keepalive (timeout) Thanks, Yuan |