[sheepdog-users] Cluster Crash
Valerio Pachera
sirio81 at gmail.com
Fri Feb 14 16:13:49 CET 2014
Hi all, I have a production cluster with
Sheepdog daemon version 0.7.0_131_g88f0024
Corosync 1.4.6
Qemu 1.6.0.
Today it crashed, probably because of a bridge malfunction (see error
below).
All sheep daemons were dead.
I killed all qemu processes, shutdown and restarted all 4 servers (and
removed the bridge that may have cause the issue).
Now node id 0 is recovering.
dog node list
Id Host:Port V-Nodes Zone
0 192.168.6.41:7000 126 688302272
1 192.168.6.42:7000 124 705079488
2 192.168.6.43:7000 147 721856704
3 192.168.6.44:7000 115 738633920
Nodes In Recovery:
Id Host:Port V-Nodes Zone Progress
0 192.168.6.41:7000 126 688302272 7.8%
I tried to run 2 guests that I need to be up, but in both cases they stop
boot complaining about file system inconsistency (press ctrl + d to reboot
or give root password for maintenance).
On guest vdi is called 'backup' and is 10G.
I killed the guest and run vdi check on it but at the next boot it behaves
the same.
The other vdi is 100G so I it would take long time to check.
May it because node id0 is still recovering?
Any other idea...?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140214/88d2abf8/attachment-0004.html>
More information about the sheepdog-users
mailing list