[sheepdog] Segfault for 0.4.0 branch

Liu Yuan namei.unix at gmail.com
Mon Jul 9 05:30:33 CEST 2012


On 07/09/2012 11:25 AM, Liu Yuan wrote:
> On 07/09/2012 09:58 AM, Liu Yuan wrote:
>> Got an weird segfault,
>>
>> (gdb) where
>> #0  0x0000000000411936 in do_process_work (work=0xd13c70) at ops.c:992
>> #1  0x000000000040ed05 in worker_routine (arg=0xd12a20) at work.c:171
>> #2  0x00007f43f992c971 in start_thread (arg=<value optimized out>) at
>> pthread_create.c:304
>> #3  0x00007f43f8eeef3d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #4  0x0000000000000000 in ?? ()
>>
>> sheep.log:
>> ...
>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead
>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:14, ::1:43328
>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43328
>> Jul 09 09:47:23 [main] cdrv_cpg_deliver(448) 5
>> Jul 09 09:47:23 [main] sd_notify_handler(851) size: 96, from: IPv4
>> ip:127.0.0.1 port:7000
>> Jul 09 09:47:23 [main] client_tx_handler(663) connection from: 13, ::1:43330
>> Jul 09 09:47:23 [main] client_handler(764) connection seems to be dead
>> Jul 09 09:47:23 [main] clear_client(703) refcnt:0, fd:13, ::1:43330
>> Jul 09 09:47:23 [main] destroy_client(672) connection from: ::1:43330
>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 13
>> Jul 09 09:47:23 [main] listen_handler(819) accepted a new connection: 14
>> Jul 09 09:47:23 [block] do_process_work(990) 80, 0 , 32579 <--- XXX
>> Jul 09 09:47:23 [main] client_rx_handler(577) connection from: 14, ::1:43337
>> Jul 09 09:47:23 [main] queue_request(323) 2
>> Jul 09 09:47:23 [main] crash_handler(408) sheep pid 5326 exited
>> unexpectedly.
>>
>> Thanks,
>> Yuan
>>
> 
> Yet another segfault.
> 
> #0  __libc_free (mem=0x7f3301864000) at malloc.c:3709
> 3709	malloc.c: No such file or directory.
> 	in malloc.c
> (gdb) where
> #0  __libc_free (mem=0x7f3301864000) at malloc.c:3709
> #1  0x00000000004090a1 in free_request (req=0x7f32fc000a00) at sdnet.c:474
> #2  0x00000000004098bd in client_tx_handler (ci=0x7f32fc0143c0) at
> sdnet.c:656
> #3  0x0000000000409d32 in client_handler (fd=14, events=4,
> data=0x7f32fc0143c0) at sdnet.c:760
> #4  0x000000000041e470 in event_loop (timeout=-1) at event.c:179
> #5  0x0000000000404376 in main (argc=7, argv=0x7fff9f1566a8) at sheep.c:275
> 

Again and again:

Program terminated with signal 11, Segmentation fault.
#0  0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981
981		return !!op->process_main;
(gdb) where
#0  0x00000000004118b4 in has_process_main (op=0x0) at ops.c:981
#1  0x00000000004057e7 in prepare_cluster_msg (req=0xb03ca0,
sizep=0x7fff129c3640) at group.c:275
#2  0x000000000040585c in cluster_op_done (work=0xb03d60) at group.c:290
#3  0x000000000040ebaf in bs_thread_request_done (fd=12, events=1,
data=0x0) at work.c:135
#4  0x000000000041e470 in event_loop (timeout=-1) at event.c:179
#5  0x0000000000404376 in main (argc=7, argv=0x7fff129c4e98) at sheep.c:275

==========================

Program terminated with signal 11, Segmentation fault.
#0  0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at
../include/list.h:79
79		next->prev = prev;
(gdb) where
#0  0x000000000040e6d9 in __list_del (prev=0x21, next=0x0) at
../include/list.h:79
#1  0x000000000040e710 in list_del (entry=0x1582420) at ../include/list.h:90
#2  0x000000000040ece2 in worker_routine (arg=0x157aa20) at work.c:168
#3  0x00007fd02a8c6971 in start_thread (arg=<value optimized out>) at
pthread_create.c:304
#4  0x00007fd029e88f3d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()


I can reproduce veracious kind of segfault by following script almost
every time, it seems that for-0.4.0 branch is broken.

===================

#!/bin/bash

pkill -9 sheep
pkill -9 collie
rm store/* -rf
for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
-z $i -p $((7000+$i));done
sleep 3
collie/collie cluster format  -c 3
sleep 1

for i in `seq 0 4`;do
	collie/collie vdi create test$i 100M
done

for i in `seq 0 4`;do
dd if=/dev/urandom | collie/collie vdi write test$i -p 7000 &
done

sleep 3
for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d
/home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 3;done;
for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
-z $i -p $((7000+$i));done

echo wait for object recovery to finish
for ((;;)); do
        if [ "$(pgrep collie)" ]; then
                sleep 1
        else
                break
        fi
done

for i in `seq 0 7`; do
	for j in `seq 0 4`; do
		./collie/collie vdi read test$j -p 700$i | md5sum
	done
done




More information about the sheepdog mailing list