[sheepdog-users] Cluster hung...
Liu Yuan
namei.unix at gmail.com
Fri Jul 27 04:06:07 CEST 2012
On 07/26/2012 09:33 PM, Bastian Scholz wrote:
> Hi List,
>
> I have a small cluster, 3 nodes, with 1 gateway each and on one
> node only one working sheep, and three working sheeps on the
> other two nodes...
>
> When a node fails, the recovery process starts as expected, but
> when the failed node joins again, the cluster hangs for a long
> time without responding to a lot of collie commands...
> collie node info and collie node recovery dont give an answer
> for at least 20 minutes.
>
> The connected kvm guest cant access the VDIs in this time and
> the windows guests dont survive this time...
>
> I am using sheepdog from sheepdog_0.4.0-0+tek2b-7_amd64.deb...
>
> Could someone explain me briefly what happens here and if I
> can avoid these hung?
>
I can't reproduce it on latest master. I have tried following steps:
1) start 5 sheeps, node 0 as gateway only, node [1-4] as storage node.
VM <---> g(0) <---> s(1,2,3,4)
2) install a new OS
3) during installation, I have tried following node failure simulation:
a) kill -9 pid (one of node[1-4]), then join it back
b) collie node kill node_id (one of [1-4]), then join it back
In both killing cases, VM is being installed without any problem.
Thanks,
Yuan
More information about the sheepdog-users
mailing list