[Stgt-devel] disk kicked out of RAID -> tgtd segmentation fault
FUJITA Tomonori
fujita.tomonori
Sat Jul 12 07:30:55 CEST 2008
On Wed, 9 Jul 2008 17:23:19 +0900
FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp> wrote:
> On Wed, 09 Jul 2008 10:16:41 +0200
> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
>
> > FUJITA Tomonori schrieb:
> > > On Wed, 09 Jul 2008 08:36:32 +0200
> > > Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> > >
> > >> FUJITA Tomonori schrieb:
> > >>> On Wed, 09 Jul 2008 08:03:05 +0200
> > >>> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> > >>>
> > >>>> FUJITA Tomonori schrieb:
> > >>>>> On Mon, 30 Jun 2008 10:54:48 +0200
> > >>>>> Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> > >>>>>
> > >>>>>> Tomasz Chmielewski schrieb:
> > >>>>>>> ronnie sahlberg schrieb:
> > >>>>>>>> Hi Tomasz,
> > >>>>>>>>
> > >>>>>>>> I could not get that configuration to work.
> > >>>>>>>>
> > >>>>>>>> Can you please provide more detailed instructions exactly how to set
> > >>>>>>>> up hosts A B and C
> > >>>>>>>> so I can try to reproduce it.
> > >>>>>>>>
> > >>>>>>>> Please provide the exact commandline for each and every command I need
> > >>>>>>>> to run on the three hosts and Ill try to
> > >>>>>>>> reproduce it under gdb.
> > >>>>>>> A faulty RAID is just one way to crash tgtd.
> > >>>>>>>
> > >>>>>>> A simpler one is to just block the traffic between the target and the
> > >>>>>>> initiator - just login to the target, make sure there is some iSCSI
> > >>>>>>> traffic between the target and the initiator, then block incoming iSCSI
> > >>>>>>> traffic on the initiator with:
> > >>>>>>>
> > >>>>>>> initiator# iptables -I INPUT -s <target IP> -p tcp --sport 3260 -j DROP
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> After a while, you will see that only one tgtd process is running,
> > >>>>>>> whereas the second has crashed.
> > >>>>>> Note - the above seems to be valid if:
> > >>>>>>
> > >>>>>> - there are two initiators connected (from different IPs), perhaps more
> > >>>>>> - there is traffic from these two initiators
> > >>>>>> - we block traffic on one of these initiators
> > >>>>>>
> > >>>>>>
> > >>>>>> I couldn't reproduce the issue with only one initiator connected.
> > >>>>> Can you provide the detailed configuration?
> > >>>>>
> > >>>>> Do you mean:
> > >>>>>
> > >>>>> 1. there are three machines, say A, B, and C.
> > >>>> yes
> > >>>>
> > >>>>> 2. you run tgtd on A and setup one target in tgtd.
> > >>>> yes
> > >>>>
> > >>>>> 3. B and C work as an initiator. They connect to A. So the target on A
> > >>>>> has two sessions.
> > >>>> yes
> > >>>>
> > >>>>> Then you block the traffic btwwen A and B, then tgtd on A dies?
> > >>>>>
> > >>>>> Right?
> > >>>> Yes, exactly like that.
> > >>>> I'm not sure if blocking traffic in both ways is needed, or is it
> > >>>> sufficient/needed to block the traffic from the initiator to the target
> > >>>> (and not from target to the initiator, i.e., -I OUTPUT chain).
> > >>> You block the traffic on the initiator and then on the target?
> > >> No, only on the initiator.
> > >>
> > >>
> > >>>>> I think that the output of tgtadm will enable us to understand your
> > >>>>> configuration easily.
> > >>>> What output?
> > >>> As I said, the output of tgtadm shows what tgtd has:
> > >>>
> > >>> Target 1: iqn.2001-04.org.osrg:viola
> > >>> System information:
> > >>> Driver: iscsi
> > >>> State: ready
> > >> Aah, this output.
> > >>
> > >> Nothing special there - two targets configured, each target has one
> > >> initiator coming from a different IP.
> > >
> > > Two targets? Hmm, I thought that you have one target machine and
> > > configure one target object.
> > >
> > > Please tell me about your target objects (configured in tgtd) and
> > > physical target machines.
> >
> > One target machine with two (or more) targets configured, like below;
> > here is the output - right now, only one initiator is connected; I can
> > reproduce the issue when a second initiator connects, but I can't do it
> > right now.
>
> In your configuration, a second initiator connects to target 2 or
> 3. Target 1 doesn't have two initiators, right? If so, it's a bit
> different from Ronnie's configuration.
OK, I think that you guys hit the same bug. I can reproduce it with
both configurations.
I think that the problem is that conn_close() calls
iscsi_free_cmd_task against tasks in conn->tx_clist. But we have non
SCSI command tasks in conn->tx_clist (like NOOP). We can't call
cmd_hlist_remove for such tasks.
Here's a fix. Can you try this?
diff --git a/usr/iscsi/conn.c b/usr/iscsi/conn.c
index 25ad170..2e83e7a 100644
--- a/usr/iscsi/conn.c
+++ b/usr/iscsi/conn.c
@@ -85,7 +85,7 @@ void conn_close(struct iscsi_connection *conn)
conn->tp->ep_close(conn);
- dprintf("connection closed\n");
+ eprintf("connection closed %p\n", conn);
/* may not have been in FFP yet */
if (!conn->session)
@@ -100,28 +100,44 @@ void conn_close(struct iscsi_connection *conn)
if (task->conn != conn)
continue;
- dprintf("Forcing release of pending task %" PRIx64 "\n",
- task->tag);
+ eprintf("Forcing release of pending task %p %" PRIx64 "\n",
+ task, task->tag);
list_del(&task->c_list);
iscsi_free_task(task);
}
list_for_each_entry_safe(task, tmp, &conn->tx_clist, c_list) {
- dprintf("Forcing release of tx task %" PRIx64 "\n",
- task->tag);
- iscsi_free_cmd_task(task);
+ uint8_t op;
+
+ op = task->req.opcode & ISCSI_OPCODE_MASK;
+
+ eprintf("Forcing release of tx task %p %" PRIx64 " %x\n",
+ task, task->tag, op);
+ switch (op) {
+ case ISCSI_OP_SCSI_CMD:
+ iscsi_free_cmd_task(task);
+ break;
+ case ISCSI_OP_NOOP_OUT:
+ case ISCSI_OP_LOGOUT:
+ case ISCSI_OP_SCSI_TMFUNC:
+ iscsi_free_task(task);
+ break;
+ default:
+ eprintf("%x\n", op);
+ break;
+ }
}
if (conn->rx_task) {
- dprintf("Forcing release of rx task %" PRIx64 "\n",
- conn->rx_task->tag);
+ eprintf("Forcing release of rx task %p %" PRIx64 "\n",
+ conn->rx_task, conn->rx_task->tag);
iscsi_free_task(conn->rx_task);
}
conn->rx_task = NULL;
if (conn->tx_task) {
- dprintf("Forcing release of tx task %" PRIx64 "\n",
- conn->tx_task->tag);
+ eprintf("Forcing release of tx task %p %" PRIx64 "\n",
+ conn->tx_task, conn->tx_task->tag);
iscsi_free_task(conn->tx_task);
}
conn->tx_task = NULL;
More information about the stgt
mailing list