[sheepdog] Segfault for 0.4.0 branch
Liu Yuan
namei.unix at gmail.com
Mon Jul 9 05:30:33 CEST 2012
On 07/09/2012 11:25 AM, Liu Yuan wrote:
> On 07/09/2012 09:58 AM, Liu Yuan wrote:
>> Got an weird segfault,
>>
>> (gdb) where
>> #0 0x0000000000411936 in do_process_work (work=0xd13c70) at ops.c:992
>> #1 0x000000000040ed05 in worker_routine (arg=0xd12a20) at work.c:171
>> #2 0x00007f43f992c971 in start_thread (arg=<value optimized out>) at
>> pthread_create.c:304
>> #3 0x00007f43f8eeef3d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #4 0x0000000000000000 in ?? ()
>>
>> sheep.log:
>> ...
>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead
>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:14, ::1:43328
>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43328
>> Jul 09 09:47:23 [main] cdrv_cpg_deliver(448) 5
>> Jul 09 09:47:23 [main] sd_notify_handler(851) size: 96, from: IPv4
>> ip:127.0.0.1 port:7000
>> Jul 09 09:47:23 [main] client_tx_handler(663) connection from: 13, ::1:43330
>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead
>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:13, ::1:43330
>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43330
>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 13
>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 14
>> Jul 09 09:47:23 [block] do_process_work(990) 80, 0 , 32579 <--- XXX
>> Jul 09 09:47:23 [main] client_rx_handler(577) connection from: 14, ::1:43337
>> Jul 09 09:47:23 [main] queue_request(323) 2
>> Jul 09 09:47:23 [main] crash_handler(408) sheep pid 5326 exited
>> unexpectedly.
>>
>> Thanks,
>> Yuan
>>
>
> Yet another segfault.
>
> #0 __libc_free (mem=0x7f3301864000) at malloc.c:3709
> 3709 malloc.c: No such file or directory.
> in malloc.c
> (gdb) where
> #0 __libc_free (mem=0x7f3301864000) at malloc.c:3709
> #1 0x00000000004090a1 in free_request (req=0x7f32fc000a00) at sdnet.c:474
> #2 0x00000000004098bd in client_tx_handler (ci=0x7f32fc0143c0) at
> sdnet.c:656
> #3 0x0000000000409d32 in client_handler (fd=14, events=4,
> data=0x7f32fc0143c0) at sdnet.c:760
> #4 0x000000000041e470 in event_loop (timeout=-1) at event.c:179
> #5 0x0000000000404376 in main (argc=7, argv=0x7fff9f1566a8) at sheep.c:275
>
Again and again:
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981
981 return !!op->process_main;
(gdb) where
#0 0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981
#1 0x00000000004057e7 in prepare_cluster_msg (req=0xb03ca0,
sizep=0x7fff129c3640) at group.c:275
#2 0x000000000040585c in cluster_op_done (work=0xb03d60) at group.c:290
#3 0x000000000040ebaf in bs_thread_request_done (fd=12, events=1,
data=0x0) at work.c:135
#4 0x000000000041e470 in event_loop (timeout=-1) at event.c:179
#5 0x0000000000404376 in main (argc=7, argv=0x7fff129c4e98) at sheep.c:275
==========================
Program terminated with signal 11, Segmentation fault.
#0 0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at
../include/list.h:79
79 next->prev = prev;
(gdb) where
#0 0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at
../include/list.h:79
#1 0x000000000040e710 in list_del (entry=0x1582420) at ../include/list.h:90
#2 0x000000000040ece2 in worker_routine (arg=0x157aa20) at work.c:168
#3 0x00007fd02a8c6971 in start_thread (arg=<value optimized out>) at
pthread_create.c:304
#4 0x00007fd029e88f3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
I can reproduce veracious kind of segfault by following script almost
every time, it seems that for-0.4.0 branch is broken.
===================
#!/bin/bash
pkill -9 sheep
pkill -9 collie
rm store/* -rf
for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
-z $i -p $((7000+$i));done
sleep 3
collie/collie cluster format -c 3
sleep 1
for i in `seq 0 4`;do
collie/collie vdi create test$i 100M
done
for i in `seq 0 4`;do
dd if=/dev/urandom | collie/collie vdi write test$i -p 7000 &
done
sleep 3
for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d
/home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 3;done;
for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
-z $i -p $((7000+$i));done
echo wait for object recovery to finish
for ((;;)); do
if [ "$(pgrep collie)" ]; then
sleep 1
else
break
fi
done
for i in `seq 0 7`; do
for j in `seq 0 4`; do
./collie/collie vdi read test$j -p 700$i | md5sum
done
done
More information about the sheepdog
mailing list