[Stgt-devel] disk kicked out of RAID -> tgtd segmentation fault

ronnie sahlberg ronniesahlberg
Thu Jul 3 02:58:33 CEST 2008


Hi Tomasz

I had no problems running TGTD under gdb.   Just let it start first
and fork()   then
ps aux |grep tgtd and  gdb -p PID to attach to each of the two processes.


What appears to happen is the task has been removed already from
struct scsi_cmd *cmd->c_hlist
so that c_hlist is actually a completely empty list.
next==prev==NULL.

Thus the list_del() helper causes a SEGV since it assumes that the
list can never be empty and that we can always
dereference the next/prev pointers.


Tomasz, can you try the patch below the gdb backtrace?
It prevents the SEGV for me.


This solves one of the bugs.  That list_del() gets a SEGV when the
list is empty.
There is probably another bug somewhere as well where tgtd has lost
track of which tasks are active and has forgotten that this task
has already been deleted/removed from the list. thus causing it to
call list_del() for a task that is not on the list.
I.e. the task is referenced from several places and when it was
deleted tgtd previously removed it from this list but forgot to remove
it from some other list/place.
I have no idea where that bug is.


regards
ronnie s


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc9c63056e0 (LWP 4124)]
0x0000000000412ed4 in __list_del (prev=0x0, next=0x0) at list.h:79
79              next->prev = prev;
(gdb) bt
#0  0x0000000000412ed4 in __list_del (prev=0x0, next=0x0) at list.h:79
#1  0x0000000000412ea3 in list_del (entry=0x642868) at list.h:85
#2  0x000000000041363d in cmd_hlist_remove (cmd=0x642860) at target.c:308
#3  0x0000000000414d92 in __cmd_done (target=0x0, cmd=0x642860) at target.c:862
#4  0x0000000000414fab in target_cmd_done (cmd=0x642860) at target.c:906
#5  0x0000000000406faf in iscsi_free_cmd_task (task=0x6427a0)
    at iscsi/iscsid.c:1081
#6  0x0000000000403613 in conn_close (conn=0x63dc88) at iscsi/conn.c:112
#7  0x000000000040ca19 in iscsi_tcp_event_handler (fd=14, events=5,
    data=0x63dc88) at iscsi/iscsi_tcp.c:166
#8  0x0000000000411513 in event_loop () at tgtd.c:251
#9  0x00000000004118e2 in main (argc=1, argv=0x7fffce311678) at tgtd.c:355


 diff --git a/usr/list.h b/usr/list.h
index 4d76057..39222ab 100644
--- a/usr/list.h
+++ b/usr/list.h
@@ -82,6 +82,9 @@ static inline void __list_del(struct list_head * prev, struct

 static inline void list_del(struct list_head *entry)
 {
+       if ((entry->prev == NULL) && (entry->next == NULL)) {
+               return;
+       }
        __list_del(entry->prev, entry->next);
        entry->next = entry->prev = NULL;
 }



On Mon, Jun 30, 2008 at 7:05 PM, Tomasz Chmielewski <mangoo at wpkg.org> wrote:
> Tomasz Chmielewski schrieb:
>
> (...)
>
>> initiator# iptables -I INPUT -s <target IP> -p tcp --sport 3260 -j DROP
>>
>>
>> After a while, you will see that only one tgtd process is running, whereas
>> the second has crashed.
>
> (...)
>
>> The above is valid with tgt-20080527, I'm just about to try tgt-20080629.
>
> It still crashes with tgt-20080629.
>
>
> --
> Tomasz Chmielewski
> http://wpkg.org
>



More information about the stgt mailing list