[sheepdog-users] Cluster hung...

Thu Jul 26 15:33:15 CEST 2012

Hi List,

I have a small cluster, 3 nodes, with 1 gateway each and on one
node only one working sheep, and three working sheeps on the
other two nodes...

When a node fails, the recovery process starts as expected, but
when the failed node joins again, the cluster hangs for a long
time without responding to a lot of collie commands...
collie node info and collie node recovery dont give an answer
for at least 20 minutes.

The connected kvm guest cant access the VDIs in this time and
the windows guests dont survive this time...

I am using sheepdog from sheepdog_0.4.0-0+tek2b-7_amd64.deb...

Could someone explain me briefly what happens here and if I
can avoid these hung?

Thanks

Bastian