[stgt] tgtd segfault during heavy I/O

Kiefer Chang zapchang at gmail.com
Mon Jul 18 14:26:07 CEST 2011


Dear Tomonori,

I applied the patch. The symptom still exists.
I add 2 print statements in the patch. I can't find it in the log
after tgtd segfault.
Please download the log if you need:
http://dl.dropbox.com/u/8354750/tgtd/20110718/messages.gz

the code:

diff --git a/Makefile b/Makefile
index 5df544d..2712b86 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,4 @@
-VERSION ?= 1.0.16
+VERSION ?= 1.0.16.0718

 CHECK_CC = cgcc
 CHECK_CC_FLAGS = '$(CHECK_CC) -Wbitwise -Wno-return-void -no-compile $(ARCH)'
diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c
index ee17ef3..5c6ff34 100644
--- a/usr/iscsi/iscsid.c
+++ b/usr/iscsi/iscsid.c
@@ -1200,6 +1200,13 @@ static int iscsi_scsi_cmd_done(uint64_t nid,
int result, struct scsi_cmd *scmd)
 	struct iscsi_task *task = ITASK(scmd);
 	uint32_t read_len = scsi_get_in_length(scmd);

+	if (result == TASK_ABORTED) {
+		eprintf("TASK_ABORTED, remove task %p\n", task);
+		list_del(&task->c_hlist);
+		iscsi_free_task(task);
+		return 0;
+	}
+
 	/*
 	 * Since the connection is closed we just free the task.
 	 * We could delay the closing of the conn in some cases and send
diff --git a/usr/target.c b/usr/target.c
index 24efb13..9fb1d18 100644
--- a/usr/target.c
+++ b/usr/target.c
@@ -1134,7 +1134,10 @@ static int abort_cmd(struct target* target,
struct mgmt_req *mreq,
 		cmd->mreq = mreq;
 		err = -EBUSY;
 	} else {
-		cmd->dev->cmd_done(target, cmd);
+		eprintf("remove cmd tag %" PRIx64 "\n", cmd->tag);
+		cmd_hlist_remove(cmd);
+		list_del(&cmd->qlist);
+
 		target_cmd_io_done(cmd, TASK_ABORTED);
 	}
 	return err;



Thanks a lot.



2011/7/16 Kiefer Chang <zapchang at gmail.com>:
> Dear Tomonori,
>
> I will try it. Thanks!
>
> By the way, using single thread for each LUN seems avoid the symptom
> (at least before weekend...)
> I will report if there are any progresses.
>
> 2011/7/16 FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp>:
>> On Thu, 14 Jul 2011 16:52:42 +0900
>> FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp> wrote:
>>
>>> On Tue, 12 Jul 2011 17:31:30 -0700
>>> Andy Grover <agrover at redhat.com> wrote:
>>>
>>> > We are also seeing this issue reported, yes based on aborting tasks:
>>> >
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=719687
>>> >
>>> > From looking at the code, it looks like target_cmd_io_done() may be
>>> > called twice for the same command, which leads to iscsi_scsi_cmd_done
>>> > being called twice, and double-freeing the iscsi_task?
>>> >
>>> > 1st: abort_task_set -> abort_cmd -> target_cmd_io_done
>>> > 2nd: abort_task_set -> abort_cmd -> cmd->dev->cmd_done() [__cmd_done] ->
>>> > post_cmd_done -> target_cmd_io_done
>>>
>>> Yeah, I think that you are right. Surely, that code looks buggy.
>>>
>>> The command that is not in the 'processed' state should live in
>>> tgt_cmd_queue. So what we need to do is that unlinking the command
>>> from the queue and freeing the command resource.
>>
>> How about the following patch? Kiefer, Can you try this?
>>
>> diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c
>> index 22a21cc..c970a49 100644
>> --- a/usr/iscsi/iscsid.c
>> +++ b/usr/iscsi/iscsid.c
>> @@ -1209,6 +1209,12 @@ static int iscsi_scsi_cmd_done(uint64_t nid, int result, struct scsi_cmd *scmd)
>>        struct iscsi_task *task = ITASK(scmd);
>>        uint32_t read_len = scsi_get_in_length(scmd);
>>
>> +       if (result == TASK_ABORTED) {
>> +               list_del(&task->c_hlist);
>> +               iscsi_free_task(task);
>> +               return 0;
>> +       }
>> +
>>        /*
>>         * Since the connection is closed we just free the task.
>>         * We could delay the closing of the conn in some cases and send
>> diff --git a/usr/target.c b/usr/target.c
>> index 21575dc..786bbb9 100644
>> --- a/usr/target.c
>> +++ b/usr/target.c
>> @@ -1134,7 +1134,9 @@ static int abort_cmd(struct target* target, struct mgmt_req *mreq,
>>                cmd->mreq = mreq;
>>                err = -EBUSY;
>>        } else {
>> -               cmd->dev->cmd_done(target, cmd);
>> +               cmd_hlist_remove(cmd);
>> +               list_del(&cmd->qlist);
>> +
>>                target_cmd_io_done(cmd, TASK_ABORTED);
>>        }
>>        return err;
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list