[sheepdog-users] Problem rejoining cluster

Takashi Menjo menjo.takashi at lab.ntt.co.jp
Mon Jul 9 10:32:41 CEST 2018


Hello,


> I have to reboot a node and  then on this node sheepdog fail to start with this
> error:
> [..]
> Jul  9 09:12:00 node02 sheep[3769]: ERROR [main] zk_join(1022) Previous
> zookeeper session exist, shoot myself. Please wait for 30 seconds to join me
> again.


Did you try to restart the node __within 30 seconds of node down__ ?
If so, please wait for a while, as ERROR log says.

Then, type "dog node list" to check whether the restarting node appears or not.
If __not__, you can restart the node to rejoin your cluster.


Regards,
Takashi

-- 
Takashi Menjo - NTT Software Innovation Center
<menjo.takashi at lab.ntt.co.jp>

> -----Original Message-----
> From: sheepdog-users [mailto:sheepdog-users-bounces at lists.wpkg.org] On
> Behalf Of Cristian Del Carlo
> Sent: Monday, July 9, 2018 4:17 PM
> To: sheepdog-users at lists.wpkg.org
> Subject: [sheepdog-users] Problem rejoining cluster
> 
> Hi,
> 
> I have a cluster with 4 node.
> 
> All nodes are installed with centos 7.x, zookeeper 3.4.6 and sheepdog 1.0.1.
> I have to reboot a node and  then on this node sheepdog fail to start with this
> error:
> 
> Jul  9 09:12:00 node02 sheep[3769]:  INFO [main] zk_init(1503) the negociated
> session timeout is 30000
> Jul  9 09:12:00 node02 sheep[3769]: NOTICE [main] get_local_addr(551) found
> IPv4 address
> Jul  9 09:12:00 node02 sheep[3769]:  INFO [main] send_join_request(1093) IPv4
> ip:10.0.0.34 port:7001 going to join the cluster
> Jul  9 09:12:00 node02 systemd: sheepdoghd.service never wrote its PID file.
> Failing.
> Jul  9 09:12:00 node02 sheep[3769]: ERROR [main] zk_join(1022) Previous
> zookeeper session exist, shoot myself. Please wait for 30 seconds to join me
> again.
> Jul  9 09:12:00 node02 systemd: Failed to start Sheepdog QEMU/KVM Block
> Storage.
> Jul  9 09:12:00 node02 systemd: Unit sheepdoghd.service entered failed state.
> Jul  9 09:12:00 node02 systemd: sheepdoghd.service failed.
> Jul  9 09:12:00 node02 systemd: sheepdoghd.service holdoff time over,
> scheduling restart.
> Jul  9 09:12:00 node02 systemd: Starting Sheepdog QEMU/KVM Block Storage...
> 
> 
> 
> Could you suggest me how to solve?
> 
> Thanks in advance for your advice.
> 
> Cristian
> 





More information about the sheepdog-users mailing list