It seems there is yet another problem (?) in tgtd. It can be easily reproduced when the initiator crashes and then starts again. I tested it only with diskless machines booted off iSCSI. To reproduce: 1. Start tgtd, apply settings with tgtadm 2. Start a diskless initiator: a) a diskless initiator fetches the kernel and the initrd via PXE/tftp b) kernel executes initrd; initrd brings the interface up c) initrd starts the iSCSI connection with "iscsistart" command from open-iscsi d) we switch to a new root, system boots fine e) IMPORTANT - system starts iscsid now (/etc/init.d/open-iscsi start) So far, everything was fine and unproblematic. 3. Now, crash your initiator machine (i.e. press reboot button)[1]. 4. Initiator starts just fine again - the connection was established with "iscsistart". 5. IMPORTANT - start iscsid now (/etc/init.d/open-iscsi start). The initiator will report "connection1:0: iscsi: detected conn error (1011)" and eventually, will break the connection, remount fs readonly etc. scary things will happen. a) there is a workaround to that: when initiator reports "connection1:0: iscsi: detected conn error..." - kill tgtd, and start it again. Initiator will reconnect flawlessly b) if you don't kill/start tgtd again, connection will break and fs will be remounted ro. The issue does not happen with IET or SCST. It looks like: - tgtd has an established connection with an initiator - initiator is killed, but tgtd still thinks initiator is connected to it - initiator connects from the same IP address - when we start iscsid on the initiator, it confuses tgtd, tgtd breaks and has to be restarted Let me know if you need such tcpdumps (if so, please give me all tcpdump command line options you would use): - point 2e) - clean start of iscsid on the initiator - point 5) - iscsid start on the initiator when connection breaks - iscsid start on the initiator, target is SCST [1] I use kexec here to reboot the machine because it has a buggy BIOS (an old Supermicro P4SBR/P4SBE server). Randomly, it doesn't reboot when a normal reboot command is used; the system shuts down, but never reboots. kexec is a nice workaround for that, but it doesn't close network sockets, so the target thinks we're still connected. -- Tomasz Chmielewski http://wpkg.org |