[sheepdog-users] vdi stalled with 2 out of 3 node online

Liu Yuan namei.unix at gmail.com
Thu Mar 27 07:51:05 CET 2014


On Wed, Mar 26, 2014 at 09:20:32AM +0100, richter at ecos.de wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> last Friday one of my three nodes has left the cluster (I don't have any
> idea why). A "dog node recovery" shows "Waiting for other nodes to join
> cluster".

I guess you are using corosync, which is said to be easily running into network
partition problem.

> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The other two nodes worked as expected. On Monday evening one VM stopped
> working (also stop and start of the VM didn't change anything).
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Now I have restarted sheepdog on the node that was not joining the cluster
> anymore, it joined the cluster, does a recovery and the VM on the other
> node, that didn't worked before, now works as expected (and _
> no
> _ data was lost :-)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If I a have a cluster with three node and number of copies is set to three,
> why does a VM stop working, if two nodes up and running? Moreover, it is
> strange that it stopped after three days.
> 

If you don't add '-t' for 'dog cluster format', sheepdog cluster will run without
any problem even if your nodes number < redundancy number.

Thanks
Yuan



More information about the sheepdog-users mailing list