Tomasz Chmielewski schrieb: > FUJITA Tomonori schrieb: >> On Thu, 11 Dec 2008 07:58:30 -0800 >> "Jesse Nelson" <spheromak at gmail.com> wrote: >> >>> were running vanila 2.6.27.4 kern with tgt 0.9.2 with about 30 >>> targets and about 10mb/s throughput >>> i am constantly (daily) seeing tgtd segfault. no real deep info just >>> this error in the logs: >>> segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in >>> tgtd[400000+23000] >>> any ideas or suggestions how i can dig deeper here ? >> >> Can you run gdb with tgtd? >> >> If you can't, can you give the very detailed information about what >> you are doing, which enable me to do the same thing you do to >> reproduce the problem. > > I'm seeing those occasionally too (one tgtd process dies), but rather > *very* rarely. > > It doesn't seem to depend on load type, number or connected/working > initiators, > configured targets etc. and I'm not sure how to reproduce it. > > One thing that comes to my mind is that one tgtd process dies when > initiator wants to read data and tgtd can't "deliver" it immediately > (i.e., I/O "frozen" because of SATA resets/exceptions/timeouts). It > doesn't happen always on such SATA timeouts and is therefore hard to > reproduce. I can reproduce it reliably on a software RAID-5 array with a broken disk (with badblocks). Just start badblocks -v /dev/broken/disk, wait for a broken area of the disk and tgtd will segfault. I guess it will also segfault on prolonged I/O access. ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in res 40/00:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x4 (timeout) ata4.00: status: { DRDY } ata4: hard resetting link ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: configured for UDMA/133 sd 4:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08 sd 4:0:0:0: [sdd] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 1d 78 cd 3d sd 4:0:0:0: [sdd] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sdd, sector 494456120 printk: 16 messages suppressed. Buffer I/O error on device sdd, logical block 61807015 ata4: EH complete sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB) sd 4:0:0:0: [sdd] Write Protect is off sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4 sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata4.00: edma_err 0x00000084, EDMA self-disable ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in res 51/40:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x9 (media error) ata4.00: status: { DRDY ERR } ata4.00: error: { UNC } ata4: hard resetting link During the scan which revealed ~300 badblocks (but there were lots of SATA timeouts/resets), tgtd segfaulted three times so far (with a check script started via cron every minute). Dec 16 12:03:49 megathecus kernel: tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4 Dec 16 12:08:18 megathecus kernel: tgtd[3558]: segfault at 000001e4 eip 0804c2fa esp 77f8caf0 error 4 Dec 16 12:44:57 megathecus kernel: tgtd[3649]: segfault at 000001e4 eip 0804c2fa esp 77bc3f30 error 4 -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html |