[stgt] help tgt segfault
Tomasz Chmielewski
mangoo at wpkg.org
Tue Dec 16 12:47:20 CET 2008
Tomasz Chmielewski schrieb:
> FUJITA Tomonori schrieb:
>> On Thu, 11 Dec 2008 07:58:30 -0800
>> "Jesse Nelson" <spheromak at gmail.com> wrote:
>>
>>> were running vanila 2.6.27.4 kern with tgt 0.9.2 with about 30
>>> targets and about 10mb/s throughput
>>> i am constantly (daily) seeing tgtd segfault. no real deep info just
>>> this error in the logs:
>>> segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in
>>> tgtd[400000+23000]
>>> any ideas or suggestions how i can dig deeper here ?
>>
>> Can you run gdb with tgtd?
>>
>> If you can't, can you give the very detailed information about what
>> you are doing, which enable me to do the same thing you do to
>> reproduce the problem.
>
> I'm seeing those occasionally too (one tgtd process dies), but rather
> *very* rarely.
>
> It doesn't seem to depend on load type, number or connected/working
> initiators,
> configured targets etc. and I'm not sure how to reproduce it.
>
> One thing that comes to my mind is that one tgtd process dies when
> initiator wants to read data and tgtd can't "deliver" it immediately
> (i.e., I/O "frozen" because of SATA resets/exceptions/timeouts). It
> doesn't happen always on such SATA timeouts and is therefore hard to
> reproduce.
I can reproduce it reliably on a software RAID-5 array with a broken disk (with badblocks).
Just start badblocks -v /dev/broken/disk, wait for a broken area of the disk and tgtd will segfault.
I guess it will also segfault on prolonged I/O access.
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in
res 40/00:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: configured for UDMA/133
sd 4:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08
sd 4:0:0:0: [sdd] Sense Key : 0xb [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
1d 78 cd 3d
sd 4:0:0:0: [sdd] ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sdd, sector 494456120
printk: 16 messages suppressed.
Buffer I/O error on device sdd, logical block 61807015
ata4: EH complete
sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
sd 4:0:0:0: [sdd] Write Protect is off
sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4
sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: edma_err 0x00000084, EDMA self-disable
ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in
res 51/40:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4: hard resetting link
During the scan which revealed ~300 badblocks (but there were lots of SATA timeouts/resets),
tgtd segfaulted three times so far (with a check script started via cron every minute).
Dec 16 12:03:49 megathecus kernel: tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4
Dec 16 12:08:18 megathecus kernel: tgtd[3558]: segfault at 000001e4 eip 0804c2fa esp 77f8caf0 error 4
Dec 16 12:44:57 megathecus kernel: tgtd[3649]: segfault at 000001e4 eip 0804c2fa esp 77bc3f30 error 4
--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the stgt
mailing list