[stgt] tgtd stuck in infinite loop

Elliot Peele elliot at rpath.com
Fri Feb 19 08:14:19 CET 2010


On Feb 18, 2010, at 10:33 PM, FUJITA Tomonori wrote:

> On Thu, 18 Feb 2010 15:28:53 -0500
> Elliot Peele <elliot at rpath.com> wrote:
> 
>> I am running tgt 1.0.1, serving four iSCSI targets to an ESX 4i cluster.
>> 
>> I have seen the daemon get into an infinite loop a couple of times and stop responding to any requests. From the looks of it in iscsi_data_out_rx_start the cmd_list ends up looping back on itself.
>> 
>> (gdb) l -
>> 1478    static int iscsi_data_out_rx_start(struct iscsi_connection *conn)
>> 1479    {
>> 1480            struct iscsi_task *task;
>> 1481            struct iscsi_data *req = (struct iscsi_data *) &conn->req.bhs;
>> 1482
>> 1483            list_for_each_entry(task, &conn->session->cmd_list, c_hlist) {
>> 1484                    if (task->tag == req->itt)
>> 1485                            goto found;
>> 1486            }
>> 1487            return -EINVAL;
>> (gdb) p conn->session->cmd_list->next
>> $9 = (struct list_head *) 0x785330
>> (gdb) p &conn->session->cmd_list->next
>> $10 = (struct list_head **) 0x690fb8
>> (gdb) p &conn->session->cmd_list->next->next
>> $11 = (struct list_head **) 0x785330
> 
> Hmm, I have no idea how the list is corrupted. I need to dig into the
> code.
> 
> 
>> Has anyone seen this behavior before? Is there anymore information that I can provide?
> 
> I've not seen any reports that might be related with this.
> 
> Is there an easy way to reproduce this? Are there any notable events
> such as aborting tasks, disconnection, etc before hitting this?

I don't have an easy way to reproduce this. The two occurrences were a few weeks apart and it isn't clear what triggered them.

There is a abort and disconnect right before the daemon went into the loop:

daemon.err<27>: Feb 17 13:10:12 localhost tgtd: abort_task_set(1008) found 750fd7 0
daemon.err<27>: Feb 17 13:10:12 localhost tgtd: conn_close(100) connection closed, 0x691618 31
daemon.err<27>: Feb 17 13:10:12 localhost tgtd: conn_close(106) sesson 0x654520 1

Is there any information that would be helpful for debugging if/when this happens again?

Elliot

> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Elliot Peele
rPath, Inc.
elliot at rpath.com




--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list