[sheepdog-users] strange behavior when deconfiguring a nic

Liu Yuan namei.unix at gmail.com
Fri Dec 21 10:39:52 CET 2012


On 12/21/2012 05:32 PM, Valerio Pachera wrote:
> I posted this isssue before but I think it deserves a thread it self.
> 
> *If I deconfigure the nic of a node, it's impossible to get back the
> node in the cluster.*
> 
> On the 3th node of my cluster:
>   ifdwon eth0
> or
>   ip addr del 192.168.2.43/24 dev eth0
> 
> Once restored the ip, I run sheep daemon again.
> It starts and in sheep.log you can see this:
>   Dec 21 10:17:18 [main] jrnl_recover(230) opening the directory
> /mnt/sheepdog/journal/
>   Dec 21 10:17:18 [main] jrnl_recover(235) starting journal recovery
>   Dec 21 10:17:18 [main] jrnl_recover(291) journal recovery complete
>   Dec 21 10:17:18 [main] send_join_request(1014) IPv4 ip:192.168.2.43 port:7000
>   Dec 21 10:17:19 [main] main(616) sheepdog daemon (version
> 0.5.5_23_gfacdf48) started
>   Dec 21 10:17:21 [main] update_cluster_info(799) status = 4, epoch =
> 8, finished: 0
> 
> Node 1 and node 2 don't show node3 by 'cluster info'.
> Node 3 shows all nodes by 'cluster info' (the state of the cluster
> before it was going off)
> The node recovery do not start.
> 
> To kill and restart sheep daemon doesn't help.
> I also tried to restart corosync without luck.
> *After a reboot of the node, the sheep daemon is able to join the cluster!*
> 
> More details in the attached file.

Maybe before restarting the sheep daemon, you'd better restart the
corosync daemon first. Please try if this solve the problem.

Thanks,
Yuan



More information about the sheepdog-users mailing list