[stgt] tgtd segfault with software RAID, hard resetting link
Tomasz Chmielewski
mangoo at wpkg.org
Tue Apr 21 14:06:27 CEST 2009
FUJITA Tomonori schrieb:
>> So I rebooted the initiator without logging it out of the target, with
>> "echo b >/proc/sysrq-trigger" (it's a diskless initiator, so basically
>> that's the only method when its disks are gone).
>>
>>
>> Initiator started to boot again and I think tgtd segfaulted when the
>> initiator tried to log in to the target.
>
> Duh, seems that we have another problem.
>
> Can you reproduce this by just rebooting the initiator without logging
> out and starting the initiator again?
It seems to be harder to cause it on purpose... but yes, it's reproducible.
It may or may not be a different problem.
1) On initiator, do:
# echo 3 > /sys/block/sda/device/timeout
# echo 3 > /sys/block/sdd/device/timeout
# dd if=/dev/zero of=/mnt/iscsi/bigfile bs=64k
2) On the target, do (drive being a part of RAID):
# i=1; while [ $i -ne 100 ] ; do echo $i; hdparm -Y /dev/sdd; i=$((i+1)); done
3) If IO errors appear on the initiator (thi seems important), reboot it without logging out of the target:
# echo b >/proc/sysrq-trigger
4) Initiator will start booting and will connect to the target.
It won't be able to boot (hdparm loop still running on the target; some data still in cache/dirty/writeback).
Interrupt the loop, if you have luck, tgtd _may_ segfault.
--------------------
While I tried to reproduce it, I did, on the initiator (both are iSCSI disks):
# echo 3 > /sys/block/sda/device/timeout
# echo 3 > /sys/block/sdd/device/timeout
Then, on the target:
i=1; while [ $i -ne 100 ] ; do echo $i; hdparm -Y /dev/sdd; i=$((i+1)); done
And it segfaulted after ~30 iterations (happened only once; no initiator reboot needed):
Apr 21 13:26:07 megathecus tgtd: conn_close(100) connection closed, 0x81a791c 1
Apr 21 13:27:12 megathecus tgtd: abort_task_set(988) found 51 0
Apr 21 13:27:12 megathecus tgtd: abort_task_set(988) found 0 0
Apr 21 13:27:12 megathecus tgtd: abort_cmd(964) found 21 e
Apr 21 13:27:20 megathecus tgtd: abort_task_set(988) found 39 0
Apr 21 13:27:20 megathecus tgtd: abort_task_set(988) found 0 0
Apr 21 13:27:20 megathecus tgtd: abort_cmd(964) found 73 e
Apr 21 13:27:41 megathecus tgtd: conn_close(100) connection closed, 0x81a791c 3
Apr 21 13:27:41 megathecus tgtd: conn_close(106) sesson 0x81a7d70 1
Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 10000051 0
Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 10000041 0
Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 10000050 0
Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 0 0
Apr 21 13:27:47 megathecus tgtd: abort_cmd(964) found 10000043 e
Apr 21 13:27:47 megathecus tgtd: abort_cmd(964) found 10000040 e
Apr 21 13:27:47 megathecus tgtd: abort_cmd(964) found 10000045 e
Apr 21 13:27:58 megathecus tgtd: abort_task_set(988) found 10000072 0
Apr 21 13:27:58 megathecus tgtd: abort_task_set(988) found 0 0
Apr 21 13:27:58 megathecus tgtd: abort_cmd(964) found 1000007e e
Apr 21 13:28:07 megathecus tgtd: conn_close(100) connection closed, 0x81a700c 4
Apr 21 13:28:07 megathecus tgtd: conn_close(106) sesson 0x81a71f0 1
Apr 21 13:28:09 megathecus kernel: tgtd[21360]: segfault at 0 ip 080546d6 sp 6cc1c340 error 4 in tgtd[8048000+24000]
--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the stgt
mailing list