[sheepdog-users] Cluster crash

Valerio Pachera sirio81 at gmail.com
Fri Dec 6 15:22:59 CET 2013


Hi,

I added the node named "sheepdog001" to my prouction cluster.
It was able to show the other node by 'do node list'.

root at sheepdog001:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.6.41:7000        126  688302272
   1   192.168.6.42:7000        124  705079488
   2   192.168.6.43:7000        147  721856704
   3   192.168.6.44:7000        115  738633920

I also get the right messages in sheep.log

dog node recovery was showing
Dec 06 14:45:47   INFO [main] md_add_disk(310) /mnt/sheep/dsk01, vdisk nr
216, total disk 1
Dec 06 14:45:47   INFO [main] md_add_disk(310) /mnt/sheep/dsk02, vdisk nr
466, total disk 2
Dec 06 14:45:47   INFO [main] md_add_disk(310) /mnt/sheep/dsk03, vdisk nr
1863, total disk 3
Dec 06 14:45:48   INFO [main] send_join_request(778) IPv4 ip:192.168.6.41
port:7000
Dec 06 14:45:48   INFO [main] check_host_env(420) Allowed open files
1024000, suggested 6144000
Dec 06 14:45:48   INFO [main] main(821) sheepdog daemon (version
0.7.0_131_g88f0024) started

But if I run 'dog node list' or any other sub command on the other node, it
hangs up, so I have to press ctrl + c.

On sheep.log of node sheepdog002 I get this:

Dec 06 14:45:48  EMERG [main] crash_handler(250) sheep exits unexpectedly
(Aborted).

but sheep process is still alive


sheepdog003 and sheepdog004 log file is empty.
Not strange message in /var/log/syslog.

The guests are alive but unable to write anything to disks.
The cluster has crashed.

Sheepdog daemon version 0.7.0_131_g88f0024

Any hint?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20131206/8a42317f/attachment-0004.html>


More information about the sheepdog-users mailing list