<div dir="ltr"><div style>Hello,</div><div style><br></div><div style>I'm having trouble adding new node into existing cluster. My current cluster consist of 4 servers. Each running a gateway at port 7000 and data at port 7001 on sheepdog version 0.5.6</div>
<div style><br></div><div style>The current 'cluster info', 'node list' is:</div><div style><br></div><div style><div>Cluster status: running</div><div><br></div><div>Cluster created at Fri Feb 8 21:19:59 2013</div>
<div><br></div><div>Epoch Time Version</div><div>2013-02-13 17:27:22 11 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-13 17:26:15 10 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-13 16:19:14 9 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-13 16:19:14 8 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-13 12:50:48 7 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-13 03:43:46 6 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>]</div>
<div>2013-02-13 00:07:55 5 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>]</div>
<div>2013-02-13 00:06:45 4 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>]</div>
<div>2013-02-08 21:21:32 3 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-08 21:21:07 2 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div>2013-02-08 21:20:00 1 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>]</div>
<div><br></div><div><br></div><div><div>M Id Host:Port V-Nodes Zone</div><div>- 0 <a href="http://172.16.0.91:7000">172.16.0.91:7000</a> 0 1526730924</div><div>- 1 <a href="http://172.16.0.91:7001">172.16.0.91:7001</a> 73 1526730924</div>
<div>- 2 <a href="http://172.16.0.92:7000">172.16.0.92:7000</a> 0 1543508140</div><div>- 3 <a href="http://172.16.0.92:7001">172.16.0.92:7001</a> 22 1543508140</div><div>- 4 <a href="http://172.16.0.93:7000">172.16.0.93:7000</a> 0 1560285356</div>
<div>- 5 <a href="http://172.16.0.93:7001">172.16.0.93:7001</a> 74 1560285356</div><div>- 6 <a href="http://172.16.0.94:7000">172.16.0.94:7000</a> 0 1577062572</div><div>- 7 <a href="http://172.16.0.94:7001">172.16.0.94:7001</a> 87 1577062572</div>
</div><div><br></div><div>==============================</div><div><br></div><div style>Now I want to add a new server into it. I first start the gateway with '/opt/sheep/sbin/sheep -g /vz/sheep-gw'. This one goes into the cluster without any problem. The cluster epoch is then incremented to version 12</div>
<div style><br></div><div style><div>2013-02-14 21:47:53 12 [<a href="http://172.16.0.91:7000">172.16.0.91:7000</a>, <a href="http://172.16.0.91:7001">172.16.0.91:7001</a>, <a href="http://172.16.0.92:7000">172.16.0.92:7000</a>, <a href="http://172.16.0.92:7001">172.16.0.92:7001</a>, <a href="http://172.16.0.93:7000">172.16.0.93:7000</a>, <a href="http://172.16.0.93:7001">172.16.0.93:7001</a>, <a href="http://172.16.0.94:7000">172.16.0.94:7000</a>, <a href="http://172.16.0.94:7001">172.16.0.94:7001</a>, <a href="http://172.16.0.95:7000">172.16.0.95:7000</a>]</div>
<div><br></div></div></div><div style>But it is not the same for data node. I use command '/opt/sheep/sbin/sheep -s 900000 -p 7001 /vz/sheep-data' to start the sheepdog daemon. This process cannot join the cluster. This log is when I tried starting it for 3 times. The cluster epoch is increased by 6 (node join + node leave 3 times).</div>
<div style><br></div><div style><div>Feb 14 21:57:55 [main] jrnl_recover(230) opening the directory /vz/sheep-data/journal/</div><div>Feb 14 21:57:55 [main] jrnl_recover(235) starting journal recovery</div><div>Feb 14 21:57:55 [main] jrnl_recover(291) journal recovery complete</div>
<div>Feb 14 21:57:55 [main] init_signal(171) register signal_handler for 10</div><div>Feb 14 21:57:55 [main] init_disk_space(371) disk free space is 943718400000</div><div>Feb 14 21:57:55 [main] create_cluster(1134) use corosync cluster driver as default</div>
<div>Feb 14 21:57:55 [main] create_cluster(1163) zone id = 1593839788</div><div>Feb 14 21:57:55 [main] send_join_request(998) IPv4 ip:172.16.0.95 port:7001</div><div>Feb 14 21:57:55 [main] check_host_env(419) Allowed core file size 0, suggested unlimited</div>
<div>Feb 14 21:57:55 [main] main(690) sheepdog daemon (version 0.5.6) started</div><div>Feb 14 21:57:55 [main] cdrv_cpg_confchg(579) mem:10, joined:1, left:0</div><div>Feb 14 21:57:55 [main] cdrv_cpg_confchg(656) Not promoting because member is not in our event list.</div>
<div>Feb 14 21:57:55 [main] cdrv_cpg_deliver(472) 0</div><div>Feb 14 21:57:55 [main] cdrv_cpg_deliver(472) 1</div><div>Feb 14 21:57:55 [main] sd_join_handler(1028) join IPv4 ip:172.16.0.95 port:7001</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [0] IPv4 ip:172.16.0.91 port:7000</div>
<div>Feb 14 21:57:55 [main] sd_join_handler(1030) [1] IPv4 ip:172.16.0.93 port:7000</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [2] IPv4 ip:172.16.0.94 port:7000</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [3] IPv4 ip:172.16.0.91 port:7001</div>
<div>Feb 14 21:57:55 [main] sd_join_handler(1030) [4] IPv4 ip:172.16.0.93 port:7001</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [5] IPv4 ip:172.16.0.94 port:7001</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [6] IPv4 ip:172.16.0.92 port:7000</div>
<div>Feb 14 21:57:55 [main] sd_join_handler(1030) [7] IPv4 ip:172.16.0.92 port:7001</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [8] IPv4 ip:172.16.0.95 port:7000</div><div>Feb 14 21:57:55 [main] sd_join_handler(1030) [9] IPv4 ip:172.16.0.95 port:7001</div>
<div>Feb 14 21:57:55 [main] update_cluster_info(783) status = 1, epoch = 18, finished: 0</div><div>Feb 14 21:57:55 [main] crash_handler(322) sheep pid 6965 exited unexpectedly.</div><div><br></div></div><div><br></div><div>
<br></div><div style>Here's strace result for command 'strace /opt/sheep/sbin/sheep -s 900000 -f -d -p 7001 /vz/sheep-data' (foreground, debug) </div><div style><br></div><div style><a href="http://pastebin.com/ej7787XC">http://pastebin.com/ej7787XC</a><br>
</div><div><br></div><div style>By the way, no data loss in the cluster but just only I can't join new node.</div><div style><br></div><div style><br></div><div><br></div><div><br></div>-- <br>Personal hosting by icez network<br>
<a href="http://www.thzhost.com">http://www.thzhost.com</a>
</div>