[stgt] help tgt segfault

Tue Dec 16 12:47:20 CET 2008

Tomasz Chmielewski schrieb:
> FUJITA Tomonori schrieb:
>> On Thu, 11 Dec 2008 07:58:30 -0800
>> "Jesse Nelson" <spheromak at gmail.com> wrote:
>>
>>> were running vanila 2.6.27.4 kern with  tgt 0.9.2   with about 30
>>> targets and about 10mb/s throughput
>>> i am constantly (daily) seeing tgtd segfault. no real deep info just
>>> this error in the logs:
>>>     segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in
>>> tgtd[400000+23000]
>>> any ideas or suggestions how i can dig deeper here ?
>>
>> Can you run gdb with tgtd?
>>
>> If you can't, can you give the very detailed information about what
>> you are doing, which enable me to do the same thing you do to
>> reproduce the problem.
> 
> I'm seeing those occasionally too (one tgtd process dies), but rather 
> *very* rarely.
> 
> It doesn't seem to depend on load type, number or connected/working 
> initiators,
> configured targets etc. and I'm not sure how to reproduce it.
> 
> One thing that comes to my mind is that one tgtd process dies when 
> initiator wants to read data and tgtd can't "deliver" it immediately 
> (i.e., I/O "frozen" because of SATA resets/exceptions/timeouts). It 
> doesn't happen always on such SATA timeouts and is therefore hard to 
> reproduce.

I can reproduce it reliably on a software RAID-5 array with a broken disk (with badblocks).

Just start badblocks -v /dev/broken/disk, wait for a broken area of the disk and tgtd will segfault.

I guess it will also segfault on prolonged I/O access.

ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen                                                                                            
ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in                                                                                          
         res 40/00:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x4 (timeout)                                                                                        
ata4.00: status: { DRDY }                                                                                                                                   
ata4: hard resetting link                                                                                                                                   
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)                                                                                                      
ata4.00: configured for UDMA/133                                                                                                                            
sd 4:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08                                                                                                     
sd 4:0:0:0: [sdd] Sense Key : 0xb [current] [descriptor]                                                                                                    
Descriptor sense data with sense descriptors (in hex):                                                                                                      
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
        1d 78 cd 3d
sd 4:0:0:0: [sdd] ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sdd, sector 494456120
printk: 16 messages suppressed.
Buffer I/O error on device sdd, logical block 61807015
ata4: EH complete
sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
sd 4:0:0:0: [sdd] Write Protect is off
sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4
sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: edma_err 0x00000084, EDMA self-disable
ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4: hard resetting link

During the scan which revealed ~300 badblocks (but there were lots of SATA timeouts/resets),
tgtd segfaulted three times so far (with a check script started via cron every minute).

Dec 16 12:03:49 megathecus kernel: tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4
Dec 16 12:08:18 megathecus kernel: tgtd[3558]: segfault at 000001e4 eip 0804c2fa esp 77f8caf0 error 4
Dec 16 12:44:57 megathecus kernel: tgtd[3649]: segfault at 000001e4 eip 0804c2fa esp 77bc3f30 error 4

-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html