FUJITA Tomonori schrieb: >> So I rebooted the initiator without logging it out of the target, with >> "echo b >/proc/sysrq-trigger" (it's a diskless initiator, so basically >> that's the only method when its disks are gone). >> >> >> Initiator started to boot again and I think tgtd segfaulted when the >> initiator tried to log in to the target. > > Duh, seems that we have another problem. > > Can you reproduce this by just rebooting the initiator without logging > out and starting the initiator again? It seems to be harder to cause it on purpose... but yes, it's reproducible. It may or may not be a different problem. 1) On initiator, do: # echo 3 > /sys/block/sda/device/timeout # echo 3 > /sys/block/sdd/device/timeout # dd if=/dev/zero of=/mnt/iscsi/bigfile bs=64k 2) On the target, do (drive being a part of RAID): # i=1; while [ $i -ne 100 ] ; do echo $i; hdparm -Y /dev/sdd; i=$((i+1)); done 3) If IO errors appear on the initiator (thi seems important), reboot it without logging out of the target: # echo b >/proc/sysrq-trigger 4) Initiator will start booting and will connect to the target. It won't be able to boot (hdparm loop still running on the target; some data still in cache/dirty/writeback). Interrupt the loop, if you have luck, tgtd _may_ segfault. -------------------- While I tried to reproduce it, I did, on the initiator (both are iSCSI disks): # echo 3 > /sys/block/sda/device/timeout # echo 3 > /sys/block/sdd/device/timeout Then, on the target: i=1; while [ $i -ne 100 ] ; do echo $i; hdparm -Y /dev/sdd; i=$((i+1)); done And it segfaulted after ~30 iterations (happened only once; no initiator reboot needed): Apr 21 13:26:07 megathecus tgtd: conn_close(100) connection closed, 0x81a791c 1 Apr 21 13:27:12 megathecus tgtd: abort_task_set(988) found 51 0 Apr 21 13:27:12 megathecus tgtd: abort_task_set(988) found 0 0 Apr 21 13:27:12 megathecus tgtd: abort_cmd(964) found 21 e Apr 21 13:27:20 megathecus tgtd: abort_task_set(988) found 39 0 Apr 21 13:27:20 megathecus tgtd: abort_task_set(988) found 0 0 Apr 21 13:27:20 megathecus tgtd: abort_cmd(964) found 73 e Apr 21 13:27:41 megathecus tgtd: conn_close(100) connection closed, 0x81a791c 3 Apr 21 13:27:41 megathecus tgtd: conn_close(106) sesson 0x81a7d70 1 Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 10000051 0 Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 10000041 0 Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 10000050 0 Apr 21 13:27:47 megathecus tgtd: abort_task_set(988) found 0 0 Apr 21 13:27:47 megathecus tgtd: abort_cmd(964) found 10000043 e Apr 21 13:27:47 megathecus tgtd: abort_cmd(964) found 10000040 e Apr 21 13:27:47 megathecus tgtd: abort_cmd(964) found 10000045 e Apr 21 13:27:58 megathecus tgtd: abort_task_set(988) found 10000072 0 Apr 21 13:27:58 megathecus tgtd: abort_task_set(988) found 0 0 Apr 21 13:27:58 megathecus tgtd: abort_cmd(964) found 1000007e e Apr 21 13:28:07 megathecus tgtd: conn_close(100) connection closed, 0x81a700c 4 Apr 21 13:28:07 megathecus tgtd: conn_close(106) sesson 0x81a71f0 1 Apr 21 13:28:09 megathecus kernel: tgtd[21360]: segfault at 0 ip 080546d6 sp 6cc1c340 error 4 in tgtd[8048000+24000] -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html |