[Stgt-devel] disk kicked out of RAID -> tgtd segmentation fault

FUJITA Tomonori fujita.tomonori
Sat Jul 12 07:30:55 CEST 2008


On Wed, 9 Jul 2008 17:23:19 +0900
FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp> wrote:

> On Wed, 09 Jul 2008 10:16:41 +0200
> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> 
> > FUJITA Tomonori schrieb:
> > > On Wed, 09 Jul 2008 08:36:32 +0200
> > > Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> > > 
> > >> FUJITA Tomonori schrieb:
> > >>> On Wed, 09 Jul 2008 08:03:05 +0200
> > >>> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> > >>>
> > >>>> FUJITA Tomonori schrieb:
> > >>>>> On Mon, 30 Jun 2008 10:54:48 +0200
> > >>>>> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> > >>>>>
> > >>>>>> Tomasz Chmielewski schrieb:
> > >>>>>>> ronnie sahlberg schrieb:
> > >>>>>>>> Hi Tomasz,
> > >>>>>>>>
> > >>>>>>>> I could not get that configuration to work.
> > >>>>>>>>
> > >>>>>>>> Can you please provide more detailed instructions exactly how to set
> > >>>>>>>> up hosts A B and C
> > >>>>>>>> so I can try to reproduce it.
> > >>>>>>>>
> > >>>>>>>> Please provide the exact commandline for each and every command I need
> > >>>>>>>> to run on the three hosts and Ill try to
> > >>>>>>>> reproduce it under gdb.
> > >>>>>>> A faulty RAID is just one way to crash tgtd.
> > >>>>>>>
> > >>>>>>> A simpler one is to just block the traffic between the target and the 
> > >>>>>>> initiator - just login to the target, make sure there is some iSCSI 
> > >>>>>>> traffic between the target and the initiator, then block incoming iSCSI 
> > >>>>>>> traffic on the initiator with:
> > >>>>>>>
> > >>>>>>> initiator# iptables -I INPUT -s <target IP> -p tcp --sport 3260 -j DROP
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> After a while, you will see that only one tgtd process is running, 
> > >>>>>>> whereas the second has crashed.
> > >>>>>> Note - the above seems to be valid if:
> > >>>>>>
> > >>>>>> - there are two initiators connected (from different IPs), perhaps more
> > >>>>>> - there is traffic from these two initiators
> > >>>>>> - we block traffic on one of these initiators
> > >>>>>>
> > >>>>>>
> > >>>>>> I couldn't reproduce the issue with only one initiator connected.
> > >>>>> Can you provide the detailed configuration?
> > >>>>>
> > >>>>> Do you mean:
> > >>>>>
> > >>>>> 1. there are three machines, say A, B, and C.
> > >>>> yes
> > >>>>
> > >>>>> 2. you run tgtd on A and setup one target in tgtd.
> > >>>> yes
> > >>>>
> > >>>>> 3. B and C work as an initiator. They connect to A. So the target on A
> > >>>>> has two sessions.
> > >>>> yes
> > >>>>
> > >>>>> Then you block the traffic btwwen A and B, then tgtd on A dies?
> > >>>>>
> > >>>>> Right?
> > >>>> Yes, exactly like that.
> > >>>> I'm not sure if blocking traffic in both ways is needed, or is it 
> > >>>> sufficient/needed to block the traffic from the initiator to the target 
> > >>>> (and not from target to the initiator, i.e., -I OUTPUT chain).
> > >>> You block the traffic on the initiator and then on the target?
> > >> No, only on the initiator.
> > >>
> > >>
> > >>>>> I think that the output of tgtadm will enable us to understand your
> > >>>>> configuration easily.
> > >>>> What output?
> > >>> As I said, the output of tgtadm shows what tgtd has:
> > >>>
> > >>> Target 1: iqn.2001-04.org.osrg:viola
> > >>>     System information:
> > >>>         Driver: iscsi
> > >>>         State: ready
> > >> Aah, this output.
> > >>
> > >> Nothing special there - two targets configured, each target has one 
> > >> initiator coming from a different IP.
> > > 
> > > Two targets? Hmm, I thought that you have one target machine and
> > > configure one target object.
> > > 
> > > Please tell me about your target objects (configured in tgtd) and
> > > physical target machines.
> > 
> > One target machine with two (or more) targets configured, like below; 
> > here is the output - right now, only one initiator is connected; I can 
> > reproduce the issue when a second initiator connects, but I can't do it 
> > right now.
> 
> In your configuration, a second initiator connects to target 2 or
> 3. Target 1 doesn't have two initiators, right? If so, it's a bit
> different from Ronnie's configuration.

OK, I think that you guys hit the same bug. I can reproduce it with
both configurations.

I think that the problem is that conn_close() calls
iscsi_free_cmd_task against tasks in conn->tx_clist. But we have non
SCSI command tasks in conn->tx_clist (like NOOP). We can't call
cmd_hlist_remove for such tasks.

Here's a fix. Can you try this?

diff --git a/usr/iscsi/conn.c b/usr/iscsi/conn.c
index 25ad170..2e83e7a 100644
--- a/usr/iscsi/conn.c
+++ b/usr/iscsi/conn.c
@@ -85,7 +85,7 @@ void conn_close(struct iscsi_connection *conn)
 
 	conn->tp->ep_close(conn);
 
-	dprintf("connection closed\n");
+	eprintf("connection closed %p\n", conn);
 
 	/* may not have been in FFP yet */
 	if (!conn->session)
@@ -100,28 +100,44 @@ void conn_close(struct iscsi_connection *conn)
 		if (task->conn != conn)
 			continue;
 
-		dprintf("Forcing release of pending task %" PRIx64 "\n",
-			task->tag);
+		eprintf("Forcing release of pending task %p %" PRIx64 "\n",
+			task, task->tag);
 		list_del(&task->c_list);
 		iscsi_free_task(task);
 	}
 
 	list_for_each_entry_safe(task, tmp, &conn->tx_clist, c_list) {
-		dprintf("Forcing release of tx task %" PRIx64 "\n",
-			task->tag);
-		iscsi_free_cmd_task(task);
+		uint8_t op;
+
+		op = task->req.opcode & ISCSI_OPCODE_MASK;
+
+		eprintf("Forcing release of tx task %p %" PRIx64 " %x\n",
+			task, task->tag, op);
+		switch (op) {
+		case ISCSI_OP_SCSI_CMD:
+			iscsi_free_cmd_task(task);
+			break;
+		case ISCSI_OP_NOOP_OUT:
+		case ISCSI_OP_LOGOUT:
+		case ISCSI_OP_SCSI_TMFUNC:
+			iscsi_free_task(task);
+			break;
+		default:
+			eprintf("%x\n", op);
+			break;
+		}
 	}
 
 	if (conn->rx_task) {
-		dprintf("Forcing release of rx task %" PRIx64 "\n",
-			conn->rx_task->tag);
+		eprintf("Forcing release of rx task %p %" PRIx64 "\n",
+			conn->rx_task, conn->rx_task->tag);
 		iscsi_free_task(conn->rx_task);
 	}
 	conn->rx_task = NULL;
 
 	if (conn->tx_task) {
-		dprintf("Forcing release of tx task %" PRIx64 "\n",
-			conn->tx_task->tag);
+		eprintf("Forcing release of tx task %p %" PRIx64 "\n",
+			conn->tx_task, conn->tx_task->tag);
 		iscsi_free_task(conn->tx_task);
 	}
 	conn->tx_task = NULL;



More information about the stgt mailing list