[stgt] tgtd segfault with software RAID, hard resetting link

Tomasz Chmielewski mangoo at wpkg.org
Tue Apr 7 10:50:52 CEST 2009


Tomasz Chmielewski schrieb:
> This night I had a SATA timeout on a drive in software RAID-1.
> 
> It recovered just fine, but unfortunately, tgtd crashed and some of the 
> initiators had I/O errors.
> 
> This is how the kernel log looks on the target - everything happened in 
> one second
> (SATA timeout, hard reset, tgtd segfault) - is it a known issue? I use 
> tgt-0.9.5.

Here is the syslog with tgtd entries - it "noticed" something's wrong 13 seconds before the kernel did?


Apr  7 04:02:41 san3 tgtd: abort_task_set(988) found 10000a02 0
Apr  7 04:02:41 san3 tgtd: conn_close(100) connection closed, 0x2586498 2
Apr  7 04:02:41 san3 tgtd: conn_close(106) sesson 0x258fd30 1
Apr  7 04:02:43 san3 tgtd: abort_task_set(988) found 40000a04 0
Apr  7 04:02:43 san3 tgtd: conn_close(100) connection closed, 0x2579c88 2
Apr  7 04:02:43 san3 tgtd: conn_close(106) sesson 0x258ea10 1
Apr  7 04:02:44 san3 tgtd: abort_task_set(988) found 20000a02 0
Apr  7 04:02:45 san3 tgtd: conn_close(100) connection closed, 0x25821e8 3
Apr  7 04:02:45 san3 tgtd: conn_close(106) sesson 0x258f3a0 1
Apr  7 04:02:47 san3 tgtd: abort_task_set(988) found a01 0
Apr  7 04:02:47 san3 tgtd: conn_close(100) connection closed, 0x2569fe8 2
Apr  7 04:02:47 san3 tgtd: conn_close(106) sesson 0x256b450 1
Apr  7 04:02:49 san3 tgtd: abort_task_set(988) found a01 0
Apr  7 04:02:49 san3 tgtd: conn_close(100) connection closed, 0x25e4d08 2
Apr  7 04:02:49 san3 tgtd: conn_close(106) sesson 0x25e4fd0 1
Apr  7 04:02:50 san3 tgtd: abort_task_set(988) found 30000a02 0
Apr  7 04:02:50 san3 tgtd: conn_close(100) connection closed, 0x258a748 2
Apr  7 04:02:50 san3 tgtd: conn_close(106) sesson 0x2591050 1
Apr  7 04:02:51 san3 tgtd: abort_task_set(988) found 10000a02 0
Apr  7 04:02:51 san3 tgtd: abort_task_set(988) found a01 0
Apr  7 04:02:51 san3 tgtd: conn_close(100) connection closed, 0x25919a8 2
Apr  7 04:02:51 san3 tgtd: conn_close(106) sesson 0x259a770 1
Apr  7 04:02:51 san3 tgtd: abort_task_set(988) found a01 0
Apr  7 04:02:51 san3 tgtd: conn_close(100) connection closed, 0x2561238 3
Apr  7 04:02:51 san3 tgtd: conn_close(106) sesson 0x25614d0 1
Apr  7 04:02:51 san3 tgtd: conn_close(100) connection closed, 0x25d7578 2
Apr  7 04:02:51 san3 tgtd: conn_close(106) sesson 0x25d7840 1
Apr  7 04:02:54 san3 kernel: [153755.828053] ata1.00: exception Emask 0x0 SAct 0x3f80f SErr 0x0 action 0x6 frozen
Apr  7 04:02:54 san3 kernel: [153755.828132] ata1.00: cmd 60/08:00:77:ba:38/00:00:25:00:00/40 tag 0 ncq 4096 in
Apr  7 04:02:54 san3 kernel: [153755.828134]          res 40/00:28:1f:7a:78/00:00:1c:00:00/40 Emask 0x4 (timeout)
Apr  7 04:02:54 san3 kernel: [153755.828224] ata1.00: status: { DRDY }

<9 more timeouts here>

Apr  7 04:02:54 san3 kernel: [153755.829298] ata1.00: cmd 61/18:88:8f:07:e8/00:00:0c:00:00/40 tag 17 ncq 12288 out
Apr  7 04:02:54 san3 kernel: [153755.829299]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  7 04:02:54 san3 kernel: [153755.829386] ata1.00: status: { DRDY }
Apr  7 04:02:54 san3 kernel: [153755.829415] ata1: hard resetting link
Apr  7 04:02:54 san3 tgtd: abort_task_set(988) found 10000a01 0
Apr  7 04:02:54 san3 kernel: [153756.312026] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr  7 04:02:54 san3 kernel: [153756.386453] ata1.00: configured for UDMA/133
Apr  7 04:02:54 san3 kernel: [153756.386548] ata1: EH complete
Apr  7 04:02:54 san3 kernel: [153756.386733] sd 1:0:0:0: [sdb] 2930277168 512-byte hardware sectors (1500302 MB)
Apr  7 04:02:54 san3 kernel: [153756.386841] sd 1:0:0:0: [sdb] Write Protect is off
Apr  7 04:02:54 san3 kernel: [153756.386889] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Apr  7 04:02:54 san3 kernel: [153756.386961] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr  7 04:02:54 san3 kernel: [153756.406006] tgtd[20545]: segfault at 31 ip 40c32d sp 75df00c0 error 4 in tgtd[400000+25000]
Apr  7 04:02:55 san3 tgtd: conn_close(100) connection closed, 0x257df38 2
Apr  7 04:02:55 san3 tgtd: conn_close(106) sesson 0x25906c0 1




-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list