[sheepdog] Segfault for 0.4.0 branch

Yunkai Zhang yunkai.me at gmail.com
Tue Jul 10 19:37:35 CEST 2012


On Mon, Jul 9, 2012 at 11:30 AM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 07/09/2012 11:25 AM, Liu Yuan wrote:
>> On 07/09/2012 09:58 AM, Liu Yuan wrote:
>>> Got an weird segfault,
>>>
>>> (gdb) where
>>> #0  0x0000000000411936 in do_process_work (work=0xd13c70) at ops.c:992
>>> #1  0x000000000040ed05 in worker_routine (arg=0xd12a20) at work.c:171
>>> #2  0x00007f43f992c971 in start_thread (arg=<value optimized out>) at
>>> pthread_create.c:304
>>> #3  0x00007f43f8eeef3d in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>> #4  0x0000000000000000 in ?? ()
>>>
>>> sheep.log:
>>> ...
>>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead
>>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:14, ::1:43328
>>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43328
>>> Jul 09 09:47:23 [main] cdrv_cpg_deliver(448) 5
>>> Jul 09 09:47:23 [main] sd_notify_handler(851) size: 96, from: IPv4
>>> ip:127.0.0.1 port:7000
>>> Jul 09 09:47:23 [main] client_tx_handler(663) connection from: 13, ::1:43330
>>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead
>>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:13, ::1:43330
>>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43330
>>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 13
>>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 14
>>> Jul 09 09:47:23 [block] do_process_work(990) 80, 0 , 32579 <--- XXX
>>> Jul 09 09:47:23 [main] client_rx_handler(577) connection from: 14, ::1:43337
>>> Jul 09 09:47:23 [main] queue_request(323) 2
>>> Jul 09 09:47:23 [main] crash_handler(408) sheep pid 5326 exited
>>> unexpectedly.
>>>
>>> Thanks,
>>> Yuan
>>>
>>
>> Yet another segfault.
>>
>> #0  __libc_free (mem=0x7f3301864000) at malloc.c:3709
>> 3709  malloc.c: No such file or directory.
>>       in malloc.c
>> (gdb) where
>> #0  __libc_free (mem=0x7f3301864000) at malloc.c:3709
>> #1  0x00000000004090a1 in free_request (req=0x7f32fc000a00) at sdnet.c:474
>> #2  0x00000000004098bd in client_tx_handler (ci=0x7f32fc0143c0) at
>> sdnet.c:656
>> #3  0x0000000000409d32 in client_handler (fd=14, events=4,
>> data=0x7f32fc0143c0) at sdnet.c:760
>> #4  0x000000000041e470 in event_loop (timeout=-1) at event.c:179
>> #5  0x0000000000404376 in main (argc=7, argv=0x7fff9f1566a8) at sheep.c:275
>>
>
> Again and again:
>
> Program terminated with signal 11, Segmentation fault.
> #0  0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981
> 981             return !!op->process_main;

I have fix this segment fault, see my newest patch.

> (gdb) where
> #0  0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981
> #1  0x00000000004057e7 in prepare_cluster_msg (req=0xb03ca0,
> sizep=0x7fff129c3640) at group.c:275
> #2  0x000000000040585c in cluster_op_done (work=0xb03d60) at group.c:290
> #3  0x000000000040ebaf in bs_thread_request_done (fd=12, events=1,
> data=0x0) at work.c:135
> #4  0x000000000041e470 in event_loop (timeout=-1) at event.c:179
> #5  0x0000000000404376 in main (argc=7, argv=0x7fff129c4e98) at sheep.c:275
>
> ==========================
>
> Program terminated with signal 11, Segmentation fault.
> #0  0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at
> ../include/list.h:79
> 79              next->prev = prev;
> (gdb) where
> #0  0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at
> ../include/list.h:79
> #1  0x000000000040e710 in list_del (entry=0x1582420) at ../include/list.h:90
> #2  0x000000000040ece2 in worker_routine (arg=0x157aa20) at work.c:168
> #3  0x00007fd02a8c6971 in start_thread (arg=<value optimized out>) at
> pthread_create.c:304
> #4  0x00007fd029e88f3d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #5  0x0000000000000000 in ?? ()
>
>
> I can reproduce veracious kind of segfault by following script almost
> every time, it seems that for-0.4.0 branch is broken.
>
> ===================
>
> #!/bin/bash
>
> pkill -9 sheep
> pkill -9 collie
> rm store/* -rf
> for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
> -z $i -p $((7000+$i));done
> sleep 3
> collie/collie cluster format  -c 3
> sleep 1
>
> for i in `seq 0 4`;do
>         collie/collie vdi create test$i 100M
> done
>
> for i in `seq 0 4`;do
> dd if=/dev/urandom | collie/collie vdi write test$i -p 7000 &
> done
>
> sleep 3
> for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d
> /home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 3;done;
> for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
> -z $i -p $((7000+$i));done
>
> echo wait for object recovery to finish
> for ((;;)); do
>         if [ "$(pgrep collie)" ]; then
>                 sleep 1
>         else
>                 break
>         fi
> done
>
> for i in `seq 0 7`; do
>         for j in `seq 0 4`; do
>                 ./collie/collie vdi read test$j -p 700$i | md5sum
>         done
> done
>
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



-- 
Yunkai Zhang
Work at Taobao



More information about the sheepdog mailing list