[sheepdog-users] node recovery after reboot, virtual machines slow to boot

Philip Crotwell crotwell at seis.sc.edu
Thu Oct 2 18:08:11 CEST 2014


I have a small sheepdog cluster and when I reboot a node, for example
after applying security patches, it takes a long time to recovery
afterwards. The data volume is not that large,  545Gb, but it takes
close to an hour to finish the recovery. The problem with this is that
virtual machines on the node that was rebooted do not themselves boot
until after the recovery finishes, meaning that for a node reboot that
takes maybe 2 minutes, I have an hour of downtime for the virtual
machines. Virsh itself even locks up during the recovery process as
well, so you can't even do "virsh list".

It seems like qemu/libvirt on the node should continue to function
during the recovery process by making use of the other nodes that are
up and functional. Is this possible? Is there any other way to make it
so the virtual machines can start up before the recovery process is
finished? Or to reduce the time it takes to do the recovery process?

This is on ubuntu trusty (14.04) so sheepdog 0.7.5-1. Is this is
improved in 0.8 which will be in 14.10 later this month?

Here is an example libvirt device for a sheepdog disk:
  <disk type='network' device='disk'>
      <driver name='qemu'/>
      <source protocol='sheepdog' name='xxxxxx'/>
      <target dev='hda' bus='ide'/>

I do not have <host> elements, would explicitly adding multiple hosts help?


More information about the sheepdog-users mailing list