[stgt] tgtd segfault with software RAID, hard resetting link
Tomasz Chmielewski
mangoo at wpkg.org
Tue Apr 7 10:50:52 CEST 2009
Tomasz Chmielewski schrieb:
> This night I had a SATA timeout on a drive in software RAID-1.
>
> It recovered just fine, but unfortunately, tgtd crashed and some of the
> initiators had I/O errors.
>
> This is how the kernel log looks on the target - everything happened in
> one second
> (SATA timeout, hard reset, tgtd segfault) - is it a known issue? I use
> tgt-0.9.5.
Here is the syslog with tgtd entries - it "noticed" something's wrong 13 seconds before the kernel did?
Apr 7 04:02:41 san3 tgtd: abort_task_set(988) found 10000a02 0
Apr 7 04:02:41 san3 tgtd: conn_close(100) connection closed, 0x2586498 2
Apr 7 04:02:41 san3 tgtd: conn_close(106) sesson 0x258fd30 1
Apr 7 04:02:43 san3 tgtd: abort_task_set(988) found 40000a04 0
Apr 7 04:02:43 san3 tgtd: conn_close(100) connection closed, 0x2579c88 2
Apr 7 04:02:43 san3 tgtd: conn_close(106) sesson 0x258ea10 1
Apr 7 04:02:44 san3 tgtd: abort_task_set(988) found 20000a02 0
Apr 7 04:02:45 san3 tgtd: conn_close(100) connection closed, 0x25821e8 3
Apr 7 04:02:45 san3 tgtd: conn_close(106) sesson 0x258f3a0 1
Apr 7 04:02:47 san3 tgtd: abort_task_set(988) found a01 0
Apr 7 04:02:47 san3 tgtd: conn_close(100) connection closed, 0x2569fe8 2
Apr 7 04:02:47 san3 tgtd: conn_close(106) sesson 0x256b450 1
Apr 7 04:02:49 san3 tgtd: abort_task_set(988) found a01 0
Apr 7 04:02:49 san3 tgtd: conn_close(100) connection closed, 0x25e4d08 2
Apr 7 04:02:49 san3 tgtd: conn_close(106) sesson 0x25e4fd0 1
Apr 7 04:02:50 san3 tgtd: abort_task_set(988) found 30000a02 0
Apr 7 04:02:50 san3 tgtd: conn_close(100) connection closed, 0x258a748 2
Apr 7 04:02:50 san3 tgtd: conn_close(106) sesson 0x2591050 1
Apr 7 04:02:51 san3 tgtd: abort_task_set(988) found 10000a02 0
Apr 7 04:02:51 san3 tgtd: abort_task_set(988) found a01 0
Apr 7 04:02:51 san3 tgtd: conn_close(100) connection closed, 0x25919a8 2
Apr 7 04:02:51 san3 tgtd: conn_close(106) sesson 0x259a770 1
Apr 7 04:02:51 san3 tgtd: abort_task_set(988) found a01 0
Apr 7 04:02:51 san3 tgtd: conn_close(100) connection closed, 0x2561238 3
Apr 7 04:02:51 san3 tgtd: conn_close(106) sesson 0x25614d0 1
Apr 7 04:02:51 san3 tgtd: conn_close(100) connection closed, 0x25d7578 2
Apr 7 04:02:51 san3 tgtd: conn_close(106) sesson 0x25d7840 1
Apr 7 04:02:54 san3 kernel: [153755.828053] ata1.00: exception Emask 0x0 SAct 0x3f80f SErr 0x0 action 0x6 frozen
Apr 7 04:02:54 san3 kernel: [153755.828132] ata1.00: cmd 60/08:00:77:ba:38/00:00:25:00:00/40 tag 0 ncq 4096 in
Apr 7 04:02:54 san3 kernel: [153755.828134] res 40/00:28:1f:7a:78/00:00:1c:00:00/40 Emask 0x4 (timeout)
Apr 7 04:02:54 san3 kernel: [153755.828224] ata1.00: status: { DRDY }
<9 more timeouts here>
Apr 7 04:02:54 san3 kernel: [153755.829298] ata1.00: cmd 61/18:88:8f:07:e8/00:00:0c:00:00/40 tag 17 ncq 12288 out
Apr 7 04:02:54 san3 kernel: [153755.829299] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 7 04:02:54 san3 kernel: [153755.829386] ata1.00: status: { DRDY }
Apr 7 04:02:54 san3 kernel: [153755.829415] ata1: hard resetting link
Apr 7 04:02:54 san3 tgtd: abort_task_set(988) found 10000a01 0
Apr 7 04:02:54 san3 kernel: [153756.312026] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 7 04:02:54 san3 kernel: [153756.386453] ata1.00: configured for UDMA/133
Apr 7 04:02:54 san3 kernel: [153756.386548] ata1: EH complete
Apr 7 04:02:54 san3 kernel: [153756.386733] sd 1:0:0:0: [sdb] 2930277168 512-byte hardware sectors (1500302 MB)
Apr 7 04:02:54 san3 kernel: [153756.386841] sd 1:0:0:0: [sdb] Write Protect is off
Apr 7 04:02:54 san3 kernel: [153756.386889] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Apr 7 04:02:54 san3 kernel: [153756.386961] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 7 04:02:54 san3 kernel: [153756.406006] tgtd[20545]: segfault at 31 ip 40c32d sp 75df00c0 error 4 in tgtd[400000+25000]
Apr 7 04:02:55 san3 tgtd: conn_close(100) connection closed, 0x257df38 2
Apr 7 04:02:55 san3 tgtd: conn_close(106) sesson 0x25906c0 1
--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the stgt
mailing list