[sheepdog] fix embryonic connection

Liu Yuan namei.unix at gmail.com
Mon Sep 3 18:38:59 CEST 2012


This paptch fix a subtle problem of connect_to() found by refined 035, which
might leave connection wait for unbounded period of time.

There is a more subtle bug for poll() wait on RTO timer instead of keepalive
timer, for example, by running 035, I met a hang of progam and noticed from command

sudo ss -iopen

...
ESTAB      0      52                                     127.0.0.1:48339                                 127.0.0.1:7001
timer:(on,49sec,12) users:(("sheep",4961,16)) ino:855713 sk:ffff88007086de80 ts sack cubic wscale:7,7 rto:120000 rtt:18.75/7.5 ato:40 ssthresh:7 send 14.0Mbps rcv_space:32792
...

where sheep:7001 was killed already.

The GDB output is:

(gdb) info thread
  5 Thread 0x7fadb3a31700 (LWP 4994)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4 Thread 0x7fadb3230700 (LWP 4995)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 0x7fadb2a2f700 (LWP 4996)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 0x7fadb222e700 (LWP 5056)  0x00007fadb6dee7f3 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
* 1 Thread 0x7fadb7b43700 (LWP 4961)  0x00007fadb6dfb533 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fadb222e700 (LWP 5056))]#0  0x00007fadb6dee7f3 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
87	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
	in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0  0x00007fadb6dee7f3 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x000000000040974a in wait_forward_request (req=<value optimized out>) at gateway.c:159
#2  gateway_forward_request (req=<value optimized out>) at gateway.c:291
#3  0x000000000040dc35 in do_process_work (work=0xdbbef0) at ops.c:1241
#4  0x000000000040d2d7 in run_short_thread (arg=0xdbb280) at work.c:75
#5  0x00007fadb772f971 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#6  0x00007fadb6dfaf3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

We can see that the connection is from __poll(). The interesting thing is that
the connection is tagged as 'ETAB' and use RTO, where I already set sockfd as keepalive.

Thanks,
Yuan



More information about the sheepdog mailing list