[sheepdog] fix embryonic connection
Liu Yuan
namei.unix at gmail.com
Mon Sep 3 18:38:59 CEST 2012
This paptch fix a subtle problem of connect_to() found by refined 035, which
might leave connection wait for unbounded period of time.
There is a more subtle bug for poll() wait on RTO timer instead of keepalive
timer, for example, by running 035, I met a hang of progam and noticed from command
sudo ss -iopen
...
ESTAB 0 52 127.0.0.1:48339 127.0.0.1:7001
timer:(on,49sec,12) users:(("sheep",4961,16)) ino:855713 sk:ffff88007086de80 ts sack cubic wscale:7,7 rto:120000 rtt:18.75/7.5 ato:40 ssthresh:7 send 14.0Mbps rcv_space:32792
...
where sheep:7001 was killed already.
The GDB output is:
(gdb) info thread
5 Thread 0x7fadb3a31700 (LWP 4994) pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
4 Thread 0x7fadb3230700 (LWP 4995) pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
3 Thread 0x7fadb2a2f700 (LWP 4996) pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
2 Thread 0x7fadb222e700 (LWP 5056) 0x00007fadb6dee7f3 in __poll (fds=<value optimized out>,
nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
* 1 Thread 0x7fadb7b43700 (LWP 4961) 0x00007fadb6dfb533 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fadb222e700 (LWP 5056))]#0 0x00007fadb6dee7f3 in __poll (fds=<value optimized out>,
nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
87 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0 0x00007fadb6dee7f3 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000040974a in wait_forward_request (req=<value optimized out>) at gateway.c:159
#2 gateway_forward_request (req=<value optimized out>) at gateway.c:291
#3 0x000000000040dc35 in do_process_work (work=0xdbbef0) at ops.c:1241
#4 0x000000000040d2d7 in run_short_thread (arg=0xdbb280) at work.c:75
#5 0x00007fadb772f971 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#6 0x00007fadb6dfaf3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()
We can see that the connection is from __poll(). The interesting thing is that
the connection is tagged as 'ETAB' and use RTO, where I already set sockfd as keepalive.
Thanks,
Yuan
More information about the sheepdog
mailing list