[stgt] tgtd segfault during heavy I/O
fujita.tomonori at lab.ntt.co.jp
Sun Jul 10 10:59:07 CEST 2011
On Mon, 4 Jul 2011 23:36:39 +0800
Kiefer Chang <zapchang at gmail.com> wrote:
> Dear Tomonori,
> We got segfault error on heavy I/O. Hope you can give some suggestion.
> 7 machines, each machine runs a VM and each VM uses 10 targets on
> tgtd. Machine equips 1GB cards.
> So there will be at least 70+ volumes on tgtd.
> The tgtd (1.0.16) is running on a machine with two 10GBe cards bonded.
> For setting up backing store of target, LVM logical volumes are used.
> (Physical volume is on software RAID 5)
> Both initiator side and target side are running CentOS 5.4.
> I tried to setting up the system so core-dump can be generated when
> problem hit. The core dump file seems incomplete, file is 8G+ bigger,
> but only use about 30~50M disk capacity.
> So I try to use gdb to attach to a debug build (make DEBUG=1) of tgtd.
> (The symptom is much easier to be reproduced during heavy I/O test and
> with optimized build of tgtd (-o2).)
> When symptom shows, I got the following backtraces: (only the latest
> part is pasted)
> [New Thread 0x2aabbaa5d940 (LWP 20176)]
> [New Thread 0x2aabbb45e940 (LWP 20177)]
> [New Thread 0x2aabbbe5f940 (LWP 20227)]
> [New Thread 0x2aabbc860940 (LWP 20228)]
> [New Thread 0x2aabbd261940 (LWP 20229)]
> [New Thread 0x2aabbdc62940 (LWP 20230)]
> [New Thread 0x2aabbe663940 (LWP 20258)]
> [New Thread 0x2aabbf064940 (LWP 20259)]
> [New Thread 0x2aabbfa65940 (LWP 20265)]
> [New Thread 0x2aabc0466940 (LWP 20266)]
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000040889d in iscsi_data_out_rx_start (conn=0x10f26028) at
> 1524 if (task->tag == req->itt)
> (gdb) bt
> #0 0x000000000040889d in iscsi_data_out_rx_start (conn=0x10f26028) at
> #1 0x0000000000409360 in iscsi_task_rx_start (conn=0x10f26028) at
> #2 0x0000000000409d42 in iscsi_rx_handler (conn=0x10f26028) at
> #3 0x0000000000411ba6 in iscsi_tcp_event_handler (fd=445, events=5,
> data=0x10f26028) at iscsi/iscsi_tcp.c:158
> #4 0x0000000000417365 in event_loop () at tgtd.c:454
> #5 0x0000000000417a16 in main (argc=1, argv=0x7fffd5eb9a98) at tgtd.c:640
> (gdb) print task
> $5 = (struct iscsi_task *) 0xffffffffffffff90
> (gdb) print req
> $6 = (struct iscsi_data *) 0x10f26148
> (gdb) p task->req
> Cannot access memory at address 0xffffffffffffff90
> (gdb) p task->rsp
> Cannot access memory at address 0xffffffffffffffc0
> (gdb) p task->tag
> Cannot access memory at address 0xfffffffffffffff0
> (gdb) p req->opcode
> $30 = 5 '\005'
> (gdb) p req->flags
> $31 = 128 '\200'
> (gdb) p req->rsvd2
> $32 = "\000"
> The system log can be downloaded from here:
> Seems *task* is freed and referenced again.
This is related with tmf (aborting task, etc)? Your next report is.
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the stgt