[stgt] help tgt segfault

Jesse Nelson spheromak at gmail.com
Tue Dec 16 12:25:20 CET 2008


course since i posted this i haven't had a segfault.. which is good,
but also not helping shed any more light.

we did however solve one issue with a switch that was causing some big
latencies.  My assumption is that network latency was somehow to blame
for the frequent segfaults in tgtd. this sorta goes along with what
Tomasz is saying.

On Mon, Dec 15, 2008 at 1:48 AM, Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> FUJITA Tomonori schrieb:
>>
>> On Thu, 11 Dec 2008 07:58:30 -0800
>> "Jesse Nelson" <spheromak at gmail.com> wrote:
>>
>>> were running vanila 2.6.27.4 kern with  tgt 0.9.2   with about 30
>>> targets and about 10mb/s throughput
>>> i am constantly (daily) seeing tgtd segfault. no real deep info just
>>> this error in the logs:
>>>    segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in
>>> tgtd[400000+23000]
>>> any ideas or suggestions how i can dig deeper here ?
>>
>> Can you run gdb with tgtd?
>>
>> If you can't, can you give the very detailed information about what
>> you are doing, which enable me to do the same thing you do to
>> reproduce the problem.
>
> I'm seeing those occasionally too (one tgtd process dies), but rather *very*
> rarely.
>
> It doesn't seem to depend on load type, number or connected/working
> initiators,
> configured targets etc. and I'm not sure how to reproduce it.
>
> One thing that comes to my mind is that one tgtd process dies when initiator
> wants to read data and tgtd can't "deliver" it immediately (i.e., I/O
> "frozen" because of SATA resets/exceptions/timeouts). It doesn't happen
> always on such SATA timeouts and is therefore hard to reproduce.
>
>
> Look at this log - tgtd segfaulted just after SATA timeouts (after ~50 days
> of working properly).
> This happened with tgtd version fetched on 2008-Oct-24, running on x86, with
> just two initiators connected, load to one target was perhaps about 5 MB/s,
> to the second target was close to 0 MB/s.
>
> Dec 11 21:57:37 megathecus kernel: ata4.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2 frozen
> Dec 11 21:57:37 megathecus kernel: ata4.00: cmd
> 25/00:00:bf:78:1f/00:02:14:00:00/e0 tag 0 dma 262144 in
> Dec 11 21:57:37 megathecus kernel:          res
> 40/00:01:01:4f:c2/40:00:15:00:00/00 Emask 0x4 (timeout)
> Dec 11 21:57:37 megathecus kernel: ata4.00: status: { DRDY }
> Dec 11 21:57:37 megathecus kernel: ata4: soft resetting link
> Dec 11 21:57:37 megathecus kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
> SControl 300)
> Dec 11 21:57:37 megathecus kernel: ata4.00: configured for UDMA/133
> Dec 11 21:57:37 megathecus kernel: ata4: EH complete
> Dec 11 21:58:07 megathecus kernel: ata4.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2 frozen
> Dec 11 21:58:07 megathecus kernel: ata4.00: cmd
> 25/00:00:bf:78:1f/00:02:14:00:00/e0 tag 0 dma 262144 in
> Dec 11 21:58:07 megathecus kernel:          res
> 40/00:01:01:4f:c2/40:00:15:00:00/00 Emask 0x4 (timeout)
> Dec 11 21:58:07 megathecus kernel: ata4.00: status: { DRDY }
> Dec 11 21:58:07 megathecus kernel: ata4: soft resetting link
> Dec 11 21:58:07 megathecus kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
> SControl 300)
> Dec 11 21:58:07 megathecus kernel: ata4.00: configured for UDMA/133
> Dec 11 21:58:07 megathecus kernel: ata4: EH complete
> Dec 11 21:58:08 megathecus kernel: tgtd[2567]: segfault at 00000220 eip
> 0804f0b5 esp 77abdac0 error 4
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] 781422768 512-byte
> hardware sectors (400088 MB)
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write Protect is off
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] 781422768 512-byte
> hardware sectors (400088 MB)
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write Protect is off
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
>
>
> I reported a similar issue in June 2008 - see the thread titled
> "disk kicked out of RAID -> tgtd segmentation fault":
>
> http://lists.wpkg.org/pipermail/stgt/2008-June/thread.html#1702
> http://lists.wpkg.org/pipermail/stgt/2008-July/thread.html#1746
>
> Can it be related somehow?
>
>
> --
> Tomasz Chmielewski
>
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list