[sheepdog-users] Cluster Crash

Fri Feb 14 16:13:49 CET 2014

Hi all, I have a production cluster with
  Sheepdog daemon version 0.7.0_131_g88f0024
  Corosync 1.4.6
  Qemu 1.6.0.

Today it crashed, probably because of a bridge malfunction (see error
below).
All sheep daemons were dead.
I killed all qemu processes, shutdown and restarted all 4 servers (and
removed the bridge that may have cause the issue).

Now node id 0  is recovering.

dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.6.41:7000        126  688302272
   1   192.168.6.42:7000        124  705079488
   2   192.168.6.43:7000        147  721856704
   3   192.168.6.44:7000        115  738633920

Nodes In Recovery:
  Id   Host:Port         V-Nodes       Zone       Progress
   0   192.168.6.41:7000     126  688302272        7.8%

I tried to run 2 guests that I need to be up, but in both cases they stop
boot complaining about file system inconsistency (press ctrl + d to reboot
or give root password for maintenance).

On guest vdi is called 'backup' and is 10G.
I killed the guest and run vdi check on it but at the next boot it behaves
the same.
The other vdi is 100G so I it would take long time to check.

May it because node id0 is still recovering?

Any other idea...?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140214/88d2abf8/attachment-0004.html>