[sheepdog] sheep gateway-only crashes when invoke collie vdi object or collie cluster cleanup

Jens WEBER jweber at tek2b.org
Thu Aug 2 21:37:52 CEST 2012


> I tested the latest sheepdog master with the following script, and it works 
> well:
> ==========================================
> #!/bin/bash
> 
> pkill -9 sheep
> rm /home/levin/code/store/* -rf
> sheep/sheep -g -d /home/levin/code/store/0 -p 7000 -z 999;
> for i in `seq 1 6`; do
>     echo "initializing sheep on port 700$i"
> 	sheep/sheep -v 32 -d /home/levin/code/store/$i -p 700$i -z $i;
> 	sleep 1;
> done
> 
> collie/collie cluster format -c 3
> for ((i=0;i<5;i++)); do
> 	qemu-img create -f raw sheepdog:test$i 10M
> 	qemu-io -c "write -P 0x01 0 10M" sheepdog:test$i
> done
> 
> collie node list
> =========================================
> 
> The output:
> initializing sheep on port 7001
> initializing sheep on port 7002
> initializing sheep on port 7003
> initializing sheep on port 7004
> initializing sheep on port 7005
> initializing sheep on port 7006
> using backend farm store
> Formatting 'sheepdog:test0', fmt=raw size=10485760 
> wrote 10485760/10485760 bytes at offset 0
> 10 MiB, 1 ops; 0.5427 sec (18.424 MiB/sec and 1.8424 ops/sec)
> Formatting 'sheepdog:test1', fmt=raw size=10485760 
> wrote 10485760/10485760 bytes at offset 0
> 10 MiB, 1 ops; 0.6068 sec (16.479 MiB/sec and 1.6479 ops/sec)
> Formatting 'sheepdog:test2', fmt=raw size=10485760 
> wrote 10485760/10485760 bytes at offset 0
> 10 MiB, 1 ops; 0.5769 sec (17.333 MiB/sec and 1.7333 ops/sec)
> Formatting 'sheepdog:test3', fmt=raw size=10485760 
> wrote 10485760/10485760 bytes at offset 0
> 10 MiB, 1 ops; 0.5272 sec (18.968 MiB/sec and 1.8968 ops/sec)
> Formatting 'sheepdog:test4', fmt=raw size=10485760 
> wrote 10485760/10485760 bytes at offset 0
> 10 MiB, 1 ops; 0.4698 sec (21.286 MiB/sec and 2.1286 ops/sec)
> M   Id   Host:Port         V-Nodes       Zone
> -    0   127.0.0.1:7000      	 0        999
> -    1   127.0.0.1:7001      	32          1
> -    2   127.0.0.1:7002      	32          2
> -    3   127.0.0.1:7003      	32          3
> -    4   127.0.0.1:7004      	32          4
> -    5   127.0.0.1:7005      	32          5
> -    6   127.0.0.1:7006      	32          6
> [levin at taobao:~/code/sheepdog]$ collie cluster cleanup
> [levin at taobao:~/code/sheepdog]$ collie vdi object test0 -i 2 
> Looking for the object 0xfd34af00000002 (the inode vid 0xfd34af idx 2) with 7 
> nodes
> 
> 127.0.0.1:7000 doesn't have the object
> 127.0.0.1:7001 doesn't have the object
> 127.0.0.1:7002 doesn't have the object
> 127.0.0.1:7003 has the object (should be 3 copies)
> 127.0.0.1:7004 has the object (should be 3 copies)
> 127.0.0.1:7005 has the object (should be 3 copies)
> 127.0.0.1:7006 doesn't have the object
> 
> Make sure you're using the code from latest sheepdog master, if it
> still happens, I'd like to see your boot script.
> 
> thanks,
> 
> levin
> 

Problem exist after cluster shutdown and restart of the cluster, here my script
=========================================
#!/bin/bash

pkill -9 sheep

cd /var/lib/sheepdog
rm -rf disc*/*
for DIR in $(ls -d disc*); do
        ln -s /etc/sheepdog/$DIR.setup $DIR/setup;
done

/etc/init.d/sheepdog start

collie cluster format -c 3
for ((i=0;i<5;i++)); do
        qemu-img create -f raw sheepdog:test$i 10M
        qemu-io -c "write -P 0x01 0 10M" sheepdog:test$i
done
echo "collie vdi object test2 -i 2 # ok, no problem"
collie vdi object test2 -i 2 # ok, no problem
echo "collie cluster cleanup       # ok, no problem"
collie cluster cleanup       # ok, no problem

# but now ...
echo "collie cluster shutdown"
collie cluster shutdown
sleep 6
/etc/init.d/sheepdog start

echo "collie vdi object test2 -i 2 # gateway-only crashes !!!!"
collie vdi object test2 -i 2 # gateway-only crashes !!!!

/etc/init.d/sheepdog start

echo "collie cluster cleanup       # gateway-only crashes !!!!"
collie cluster cleanup       # gateway-only crashes !!!!
=========================================
The output:
[ ok ] Starting sheepdog : sheepdog gateway-only.
[ ok ] Starting sheepdog : sheepdog for Disk A.
[ ok ] Starting sheepdog : sheepdog for Disk B.
[ ok ] Starting sheepdog : sheepdog for Disk C.
[ ok ] Starting sheepdog : sheepdog for Disk D.
[ ok ] Starting sheepdog : sheepdog for Disk E.
[ ok ] Starting sheepdog : sheepdog for Disk F.
[ ok ] sheepdog gateway-only (-d -p 7000 -g -z 999 /var/lib/sheepdog/disc0) is running.
[ ok ] sheepdog for Disk A (-d -p 7001 -v 32 -z 1 /var/lib/sheepdog/disc1) is running.
[ ok ] sheepdog for Disk B (-d -p 7002 -v 32 -z 2 /var/lib/sheepdog/disc2) is running.
[ ok ] sheepdog for Disk C (-d -p 7003 -v 32 -z 3 /var/lib/sheepdog/disc3) is running.
[ ok ] sheepdog for Disk D (-d -p 7004 -v 32 -z 4 /var/lib/sheepdog/disc4) is running.
[ ok ] sheepdog for Disk E (-d -p 7005 -v 32 -z 5 /var/lib/sheepdog/disc5) is running.
[ ok ] sheepdog for Disk F (-d -p 7006 -v 32 -z 6 /var/lib/sheepdog/disc6) is running.
using backend farm store
Formatting 'sheepdog:test0', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0:00:06.25 (1.599 MiB/sec and 0.1599 ops/sec)
Formatting 'sheepdog:test1', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0:00:04.60 (2.171 MiB/sec and 0.2171 ops/sec)
Formatting 'sheepdog:test2', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0:00:04.47 (2.233 MiB/sec and 0.2233 ops/sec)
Formatting 'sheepdog:test3', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0:00:06.08 (1.642 MiB/sec and 0.1642 ops/sec)
Formatting 'sheepdog:test4', fmt=raw size=10485760 
wrote 10485760/10485760 bytes at offset 0
10 MiB, 1 ops; 0:00:04.46 (2.239 MiB/sec and 0.2239 ops/sec)
collie vdi object test2 -i 2 # ok, no problem
Looking for the object 0xfd381500000002 (the inode vid 0xfd3815 idx 2) with 7 nodes

172.30.0.80:7000 doesn't have the object
172.30.0.80:7001 has the object (should be 3 copies)
172.30.0.80:7002 has the object (should be 3 copies)
172.30.0.80:7003 doesn't have the object
172.30.0.80:7004 doesn't have the object
172.30.0.80:7005 doesn't have the object
172.30.0.80:7006 has the object (should be 3 copies)
collie cluster cleanup       # ok, no problem
collie cluster shutdown
[ ok ] Starting sheepdog : sheepdog gateway-only.
[ ok ] Starting sheepdog : sheepdog for Disk A.
[ ok ] Starting sheepdog : sheepdog for Disk B.
[ ok ] Starting sheepdog : sheepdog for Disk C.
[ ok ] Starting sheepdog : sheepdog for Disk D.
[ ok ] Starting sheepdog : sheepdog for Disk E.
[ ok ] Starting sheepdog : sheepdog for Disk F.
[ ok ] sheepdog gateway-only (-d -p 7000 -g -z 999 /var/lib/sheepdog/disc0) is running.
[ ok ] sheepdog for Disk A (-d -p 7001 -v 32 -z 1 /var/lib/sheepdog/disc1) is running.
[ ok ] sheepdog for Disk B (-d -p 7002 -v 32 -z 2 /var/lib/sheepdog/disc2) is running.
[ ok ] sheepdog for Disk C (-d -p 7003 -v 32 -z 3 /var/lib/sheepdog/disc3) is running.
[ ok ] sheepdog for Disk D (-d -p 7004 -v 32 -z 4 /var/lib/sheepdog/disc4) is running.
[ ok ] sheepdog for Disk E (-d -p 7005 -v 32 -z 5 /var/lib/sheepdog/disc5) is running.
[ ok ] sheepdog for Disk F (-d -p 7006 -v 32 -z 6 /var/lib/sheepdog/disc6) is running.
collie vdi object test2 -i 2 # gateway-only crashes !!!!
[main] do_read(269) failed to read from socket: 0
[main] exec_req(356) failed to read a response
Failed to connect to 172.30.0.80:7000
The node list has changed: please try again
The node list has changed: please try again
The node list has changed: please try again
The node list has changed: please try again
The node list has changed: please try again
The node list has changed: please try again
Failed to read the inode object 0xfd3815
[ ok ] Starting sheepdog : sheepdog gateway-only.
[warn] Starting sheepdog : sheepdog for Disk A is already running (warning).
[warn] Starting sheepdog : sheepdog for Disk B is already running (warning).
[warn] Starting sheepdog : sheepdog for Disk C is already running (warning).
[warn] Starting sheepdog : sheepdog for Disk D is already running (warning).
[warn] Starting sheepdog : sheepdog for Disk E is already running (warning).
[warn] Starting sheepdog : sheepdog for Disk F is already running (warning).
[ ok ] sheepdog gateway-only (-d -p 7000 -g -z 999 /var/lib/sheepdog/disc0) is running.
[ ok ] sheepdog for Disk A (-d -p 7001 -v 32 -z 1 /var/lib/sheepdog/disc1) is running.
[ ok ] sheepdog for Disk B (-d -p 7002 -v 32 -z 2 /var/lib/sheepdog/disc2) is running.
[ ok ] sheepdog for Disk C (-d -p 7003 -v 32 -z 3 /var/lib/sheepdog/disc3) is running.
[ ok ] sheepdog for Disk D (-d -p 7004 -v 32 -z 4 /var/lib/sheepdog/disc4) is running.
[ ok ] sheepdog for Disk E (-d -p 7005 -v 32 -z 5 /var/lib/sheepdog/disc5) is running.
[ ok ] sheepdog for Disk F (-d -p 7006 -v 32 -z 6 /var/lib/sheepdog/disc6) is running.
collie cluster cleanup       # gateway-only crashes !!!!
[main] do_read(269) failed to read from socket: 0
[main] exec_req(356) failed to read a response
failed to connect to  localhost:7000
failed to execute request

Thanks, Jens



More information about the sheepdog mailing list