[sheepdog-users] sheep doesn't join the cluster on reboot

Liu Yuan namei.unix at gmail.com
Fri Jan 11 15:16:47 CET 2013


On 01/11/2013 10:00 PM, Valerio Pachera wrote:
> Hi, I've been able to reproduce this problem:
> 
> I have a mode that was out of the cluster since long time.
> I choose to empty its data folder before join the cluster again.
> I didn't run the daemon on a shell, but set
>   /usr/sbin/sheep /mnt/sheepdog
> In
>   /etc/rc.local
> and reboot the node.
> 
> The daemon starts succesfully but it doesn't join the cluster.
> 
> Node1
> root at sheepdog001:~# collie node list
> M   Id   Host:Port         V-Nodes       Zone
> -    0   192.168.2.41:7000      64  688040128
> 
> Node2
> root at sheepdog002:~# collie node list
> M   Id   Host:Port         V-Nodes       Zone
> -    0   192.168.2.42:7000      57  704817344
> -    1   192.168.2.43:7000      106  721594560
> -    2   192.168.2.215:7000     29 -687691584
> 
> root at sheepdog001:~# more /mnt/sheepdog/sheep.log
> Jan 11 14:41:34 [main] send_join_request(1014) IPv4 ip:192.168.2.41 port:7000
> Jan 11 14:41:34 [main] main(616) sheepdog daemon (version
> 0.5.5_36_g6101dbe) started
> Jan 11 14:41:34 [main] update_cluster_info(799) status = 2, epoch = 0,
> finished: 0
> 
> I also tried to postponde the daemon startup:
>   sleep 30; /usr/sbin/sheep /mnt/sheepdog
> but it doesn't change.
> 
> If I reboot the node and then run
>   sheep /mnt/sheepdog
> the node joins the cluster correctly.
> 
> sheep -h 0.5.5_36_g6101dbe
> Debian wheezy 3.2.0-4-amd64
> 

I think this would be more a corosync bug. You can try to check
/var/log/syslog to check corosync log (compare good and bad cases).

Thanks,
Yuan



More information about the sheepdog-users mailing list