MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes: > Chris Webb wrote: > > > > If the failed node is just partitioned away from the rest of the cluster > > rather than failing, what's supposed to happen to the sheep instances and > > the qemus on it? I saw operations hang indefinitely, which is the intended > > Sheepdog cannot distinguish the temporary disconnected node from the > failed one, so the sheep instances will abort and qemus will hang > forever. That's fine, it's a safe behaviour and presumably I can also detect it on the host and reboot. I don't particularly want to be able to restart automatically, just to be reasonably sure that they won't spontaneously restart without intervention if I automatically restart them elsewhere when the node vanishes! So just to be clear, is the expected behaviour here that the sheep on the isolated node will exit as they can no longer continue? I think I might have seen a hang rather than an exit, but I'll recheck with a recent corosync as I think I accidentally ran my previous test with a relatively elderly one. Best wishes, Chris. |