[sheepdog] sheep gateway-only crashes when invoke collie vdi object or collie cluster cleanup

levin li levin108 at gmail.com
Thu Aug 2 08:31:16 CEST 2012


On 2012年08月02日 12:57, Jens WEBER wrote:
>>> Setup
>>> root at sheep01:~# /etc/init.d/sheepdog status
>>> [ ok ] sheepdog gateway-only (-d -p 7000 -v 0 -z 999 /var/lib/sheepdog/disc0) 
>> is running.
>>> [ ok ] sheepdog for Disk A (-d -p 7001 -v 32 -z 1 /var/lib/sheepdog/disc1) is 
>> running.
>>> [ ok ] sheepdog for Disk B (-d -p 7002 -v 32 -z 2 /var/lib/sheepdog/disc2) is 
>> running.
>>> [ ok ] sheepdog for Disk C (-d -p 7003 -v 32 -z 3 /var/lib/sheepdog/disc3) is 
>> running.
>>> [ ok ] sheepdog for Disk D (-d -p 7004 -v 32 -z 4 /var/lib/sheepdog/disc4) is 
>> running.
>>> [ ok ] sheepdog for Disk E (-d -p 7005 -v 32 -z 5 /var/lib/sheepdog/disc5) is 
>> running.
>>> [ ok ] sheepdog for Disk F (-d -p 7006 -v 32 -z 6 /var/lib/sheepdog/disc6) is 
>> running.
>>>
>>> collie vdi object test -> sheep gateway-only crashes
>>> Jul 31 12:43:54 [main] listen_handler(839) accepted a new connection: 15
>>> Jul 31 12:43:54 [main] client_handler(777) 1, rx 0, tx 0
>>> Jul 31 12:43:54 [main] finish_rx(580) 15, 172.30.0.80:49199
>>> Jul 31 12:43:54 [main] queue_request(330) READ_PEER
>>> Jul 31 12:43:54 [io 1] do_process_work(1083) a4, 807c2b2500000000 , 95
>>> Jul 31 12:43:54 [main] crash_handler(409) sheep pid 23228 exited unexpectedly.
>>>
>>> collie cluster cleanup -> sheep gateway-only crashes
>>> Jul 31 12:46:27 [main] listen_handler(839) accepted a new connection: 15
>>> Jul 31 12:46:27 [main] client_handler(777) 1, rx 0, tx 0
>>> Jul 31 12:46:27 [main] finish_rx(580) 15, 127.0.0.1:33422
>>> Jul 31 12:46:27 [main] queue_request(330) CLEANUP
>>> Jul 31 12:46:27 [main] queue_cluster_request(345) CLEANUP (0x2276bd0)
>>> Jul 31 12:46:27 [main] cdrv_cpg_deliver(454) 3
>>> Jul 31 12:46:27 [main] sd_notify_handler(851) op CLEANUP, size: 96, from: 
>> IPv4 ip:172.30.0.80 port:7000
>>> Jul 31 12:46:27 [main] crash_handler(409) sheep pid 23703 exited unexpectedly.
>>>
>>> Cheers Jens
>>>
>>
>> Well, I tried this but can not reproduce it, can you give some more details to 
>> reproduce this bug?
>>
>> thanks
>>
>> levin
>>
> Nothing special, latest Debian Wheezy and latest sheedog from master
> Are the config files helpful? I will send later. Also I try again with a fresh formatted cluster. I can reproduce every time.
> 
> Thans Jens
> 

I tested the latest sheepdog master with the following script, and it works well:
==========================================
#!/bin/bash

pkill -9 sheep
rm /home/levin/code/store/* -rf
sheep/sheep -g -d /home/levin/code/store/0 -p 7000 -z 999;
for i in `seq 1 6`; do
    echo "initializing sheep on port 700$i"
	sheep/sheep -v 32 -d /home/levin/code/store/$i -p 700$i -z $i;
	sleep 1;
done

collie/collie cluster format -c 3
for ((i=0;i<5;i++)); do
	qemu-img create -f raw sheepdog:test$i 10M
	qemu-io -c "write -P 0x01 0 10M" sheepdog:test$i
done

collie node list
=========================================

The output:
initializing sheep on port 7001
initializing sheep on port 7002
initializing sheep on port 7003
initializing sheep on port 7004
initializing sheep on port 7005
initializing sheep on port 7006
using backend farm store
Formatting 'sheepdog:test0', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0.5427 sec (18.424 MiB/sec and 1.8424 ops/sec)
Formatting 'sheepdog:test1', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0.6068 sec (16.479 MiB/sec and 1.6479 ops/sec)
Formatting 'sheepdog:test2', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0.5769 sec (17.333 MiB/sec and 1.7333 ops/sec)
Formatting 'sheepdog:test3', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0.5272 sec (18.968 MiB/sec and 1.8968 ops/sec)
Formatting 'sheepdog:test4', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0.4698 sec (21.286 MiB/sec and 2.1286 ops/sec)
M   Id   Host:Port         V-Nodes       Zone
-    0   127.0.0.1:7000      	 0        999
-    1   127.0.0.1:7001      	32          1
-    2   127.0.0.1:7002      	32          2
-    3   127.0.0.1:7003      	32          3
-    4   127.0.0.1:7004      	32          4
-    5   127.0.0.1:7005      	32          5
-    6   127.0.0.1:7006      	32          6
[levin at taobao:~/code/sheepdog]$ collie cluster cleanup
[levin at taobao:~/code/sheepdog]$ collie vdi object test0 -i 2 
Looking for the object 0xfd34af00000002 (the inode vid 0xfd34af idx 2) with 7 nodes

127.0.0.1:7000 doesn't have the object
127.0.0.1:7001 doesn't have the object
127.0.0.1:7002 doesn't have the object
127.0.0.1:7003 has the object (should be 3 copies)
127.0.0.1:7004 has the object (should be 3 copies)
127.0.0.1:7005 has the object (should be 3 copies)
127.0.0.1:7006 doesn't have the object

Make sure you're using the code from latest sheepdog master, if it
still happens, I'd like to see your boot script.

thanks,

levin




More information about the sheepdog mailing list