[stgt] tgtd segfault during heavy I/O

James R. Leu jleu at inoc.com
Mon Jul 18 14:58:28 CEST 2011


Kiefer,

How are you configuring tgtd to use a single thread for each LUN?

On Mon, Jul 18, 2011 at 08:26:07PM +0800, Kiefer Chang wrote:
> Dear Tomonori,
> 
> I applied the patch. The symptom still exists.
> I add 2 print statements in the patch. I can't find it in the log
> after tgtd segfault.
> Please download the log if you need:
> http://dl.dropbox.com/u/8354750/tgtd/20110718/messages.gz
> 
> the code:
> 
> diff --git a/Makefile b/Makefile
> index 5df544d..2712b86 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,4 +1,4 @@
> -VERSION ?= 1.0.16
> +VERSION ?= 1.0.16.0718
> 
>  CHECK_CC = cgcc
>  CHECK_CC_FLAGS = '$(CHECK_CC) -Wbitwise -Wno-return-void -no-compile $(ARCH)'
> diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c
> index ee17ef3..5c6ff34 100644
> --- a/usr/iscsi/iscsid.c
> +++ b/usr/iscsi/iscsid.c
> @@ -1200,6 +1200,13 @@ static int iscsi_scsi_cmd_done(uint64_t nid,
> int result, struct scsi_cmd *scmd)
>  	struct iscsi_task *task = ITASK(scmd);
>  	uint32_t read_len = scsi_get_in_length(scmd);
> 
> +	if (result == TASK_ABORTED) {
> +		eprintf("TASK_ABORTED, remove task %p\n", task);
> +		list_del(&task->c_hlist);
> +		iscsi_free_task(task);
> +		return 0;
> +	}
> +
>  	/*
>  	 * Since the connection is closed we just free the task.
>  	 * We could delay the closing of the conn in some cases and send
> diff --git a/usr/target.c b/usr/target.c
> index 24efb13..9fb1d18 100644
> --- a/usr/target.c
> +++ b/usr/target.c
> @@ -1134,7 +1134,10 @@ static int abort_cmd(struct target* target,
> struct mgmt_req *mreq,
>  		cmd->mreq = mreq;
>  		err = -EBUSY;
>  	} else {
> -		cmd->dev->cmd_done(target, cmd);
> +		eprintf("remove cmd tag %" PRIx64 "\n", cmd->tag);
> +		cmd_hlist_remove(cmd);
> +		list_del(&cmd->qlist);
> +
>  		target_cmd_io_done(cmd, TASK_ABORTED);
>  	}
>  	return err;
> 
> 
> 
> Thanks a lot.
> 
> 
> 
> 2011/7/16 Kiefer Chang <zapchang at gmail.com>:
> > Dear Tomonori,
> >
> > I will try it. Thanks!
> >
> > By the way, using single thread for each LUN seems avoid the symptom
> > (at least before weekend...)
> > I will report if there are any progresses.
> >
> > 2011/7/16 FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp>:
> >> On Thu, 14 Jul 2011 16:52:42 +0900
> >> FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp> wrote:
> >>
> >>> On Tue, 12 Jul 2011 17:31:30 -0700
> >>> Andy Grover <agrover at redhat.com> wrote:
> >>>
> >>> > We are also seeing this issue reported, yes based on aborting tasks:
> >>> >
> >>> > https://bugzilla.redhat.com/show_bug.cgi?id=719687
> >>> >
> >>> > From looking at the code, it looks like target_cmd_io_done() may be
> >>> > called twice for the same command, which leads to iscsi_scsi_cmd_done
> >>> > being called twice, and double-freeing the iscsi_task?
> >>> >
> >>> > 1st: abort_task_set -> abort_cmd -> target_cmd_io_done
> >>> > 2nd: abort_task_set -> abort_cmd -> cmd->dev->cmd_done() [__cmd_done] ->
> >>> > post_cmd_done -> target_cmd_io_done
> >>>
> >>> Yeah, I think that you are right. Surely, that code looks buggy.
> >>>
> >>> The command that is not in the 'processed' state should live in
> >>> tgt_cmd_queue. So what we need to do is that unlinking the command
> >>> from the queue and freeing the command resource.
> >>
> >> How about the following patch? Kiefer, Can you try this?
> >>
> >> diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c
> >> index 22a21cc..c970a49 100644
> >> --- a/usr/iscsi/iscsid.c
> >> +++ b/usr/iscsi/iscsid.c
> >> @@ -1209,6 +1209,12 @@ static int iscsi_scsi_cmd_done(uint64_t nid, int result, struct scsi_cmd *scmd)
> >>        struct iscsi_task *task = ITASK(scmd);
> >>        uint32_t read_len = scsi_get_in_length(scmd);
> >>
> >> +       if (result == TASK_ABORTED) {
> >> +               list_del(&task->c_hlist);
> >> +               iscsi_free_task(task);
> >> +               return 0;
> >> +       }
> >> +
> >>        /*
> >>         * Since the connection is closed we just free the task.
> >>         * We could delay the closing of the conn in some cases and send
> >> diff --git a/usr/target.c b/usr/target.c
> >> index 21575dc..786bbb9 100644
> >> --- a/usr/target.c
> >> +++ b/usr/target.c
> >> @@ -1134,7 +1134,9 @@ static int abort_cmd(struct target* target, struct mgmt_req *mreq,
> >>                cmd->mreq = mreq;
> >>                err = -EBUSY;
> >>        } else {
> >> -               cmd->dev->cmd_done(target, cmd);
> >> +               cmd_hlist_remove(cmd);
> >> +               list_del(&cmd->qlist);
> >> +
> >>                target_cmd_io_done(cmd, TASK_ABORTED);
> >>        }
> >>        return err;
> >>
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
James R. Leu
Software Architect
INOC
608.204.0203
608.663.4555 fax
jleu at inoc.com
www.inoc.com

*** DELIVERING UPTIME ***
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.wpkg.org/pipermail/stgt/attachments/20110718/add4e267/attachment-0001.sig>


More information about the stgt mailing list