[Stgt-devel] yet another tgtd iSCSI misbehaviour (aborted journal, remounting ro)

Tomasz Chmielewski mangoo
Wed Feb 6 10:45:54 CET 2008


It seems there is yet another problem (?) in tgtd.

It can be easily reproduced when the initiator crashes and then starts 
again. I tested it only with diskless machines booted off iSCSI.

To reproduce:

1. Start tgtd, apply settings with tgtadm
2. Start a diskless initiator:
  a) a diskless initiator fetches the kernel and the initrd via PXE/tftp
  b) kernel executes initrd; initrd brings the interface up
  c) initrd starts the iSCSI connection with "iscsistart" command from 
open-iscsi
  d) we switch to a new root, system boots fine
  e) IMPORTANT - system starts iscsid now (/etc/init.d/open-iscsi start)

So far, everything was fine and unproblematic.

3. Now, crash your initiator machine (i.e. press reboot button)[1].

4. Initiator starts just fine again - the connection was established 
with "iscsistart".

5. IMPORTANT - start iscsid now (/etc/init.d/open-iscsi start). The 
initiator will report "connection1:0: iscsi: detected conn error (1011)" 
and eventually, will break the connection, remount fs readonly etc. 
scary things will happen.

  a) there is a workaround to that: when initiator reports 
"connection1:0: iscsi: detected conn error..." - kill tgtd, and start it 
again. Initiator will reconnect flawlessly
  b) if you don't kill/start tgtd again, connection will break and fs 
will be remounted ro.


The issue does not happen with IET or SCST.

It looks like:
- tgtd has an established connection with an initiator
- initiator is killed, but tgtd still thinks initiator is connected to it
- initiator connects from the same IP address
- when we start iscsid on the initiator, it confuses tgtd, tgtd breaks 
and has to be restarted


Let me know if you need such tcpdumps (if so, please give me all tcpdump 
command line options you would use):

- point 2e) - clean start of iscsid on the initiator
- point 5) - iscsid start on the initiator when connection breaks
- iscsid start on the initiator, target is SCST


[1] I use kexec here to reboot the machine because it has a buggy BIOS 
(an old Supermicro P4SBR/P4SBE server). Randomly, it doesn't reboot when 
a normal reboot command is used; the system shuts down, but never 
reboots. kexec is a nice workaround for that, but it doesn't close 
network sockets, so the target thinks we're still connected.


-- 
Tomasz Chmielewski
http://wpkg.org



More information about the stgt mailing list