[sheepdog-users] Cluster hung...

Thu Jul 26 16:26:07 CEST 2012

My idea behind is, that a disk can fail. When I have replaced
it I had to restart the sheep for that disk. If this is the
first sheep, than it handles the connected qemu instances too,
which will die when the sheep process is killed.

To avoid this situation and allow me to (hot) replace failed
disks I had a separated gateway sheep which is (nearly)
independent of any disks.

Cheers

Bastian

Am 2012-07-26 16:00, schrieb David Douard:
> On 26/07/2012 15:33, Bastian Scholz wrote:
>> Hi List,
>>
>> I have a small cluster, 3 nodes, with 1 gateway each and on one
>> node only one working sheep, and three working sheeps on the
>> other two nodes...
>
> Hi,
>
> just a question:
>
> why having a gateway on each node? Is it a recommended configuration 
> to
> have a gateway on each node?
>
> David
>
>
>>
>> When a node fails, the recovery process starts as expected, but
>> when the failed node joins again, the cluster hangs for a long
>> time without responding to a lot of collie commands...
>> collie node info and collie node recovery dont give an answer
>> for at least 20 minutes.
>>
>> The connected kvm guest cant access the VDIs in this time and
>> the windows guests dont survive this time...
>>
>> I am using sheepdog from sheepdog_0.4.0-0+tek2b-7_amd64.deb...
>>
>> Could someone explain me briefly what happens here and if I
>> can avoid these hung?
>>
>> Thanks
>>
>> Bastian