<div dir="ltr"><div><div><div>Hi, I was going to check if my testing cluster was going to have problems adding a node with a single nic (and the other using 2).<br></div><div>I've been upgrading to latest sheepdog version: 0.7.0_197_g9f718d2.<br>
</div><div><br></div>I removed a node (test006) by dog node kill and waited for the recovery.<br><br></div>When recovery was completed, I was going to re-add the node but I noticed that 'dog node info' was showing a single node!<br>
<br></div><div>This is bizarre!<br></div><div>sheep is alive on test004 test005 test007 but only test007 is in the cluster.<br></div><div>Corosync is alive on all nodes.<br></div><div><br>2013-12-12 16:15:14 35 [<a href="http://192.168.2.47:7000">192.168.2.47:7000</a>]<br>
2013-12-12 16:14:34 34 [<a href="http://192.168.2.45:7000">192.168.2.45:7000</a>, <a href="http://192.168.2.47:7000">192.168.2.47:7000</a>]<br>2013-12-12 16:06:45 33 [<a href="http://192.168.2.44:7000">192.168.2.44:7000</a>, <a href="http://192.168.2.45:7000">192.168.2.45:7000</a>, <a href="http://192.168.2.47:7000">192.168.2.47:7000</a>]<br>
2013-12-11 14:38:48 32 [<a href="http://192.168.2.44:7000">192.168.2.44:7000</a>, <a href="http://192.168.2.45:7000">192.168.2.45:7000</a>, <a href="http://192.168.2.46:7000">192.168.2.46:7000</a>, <a href="http://192.168.2.47:7000">192.168.2.47:7000</a>]<br>
<br><span id="result_box" class="" lang="en"><span class="">Unfortunately</span></span> I removed all sheep.log before noticing this issue.<br><br></div><div>These are some other info I can get (test004):<br></div><div><br>
</div><div>/var/log/messages<br></div><div>Dec 12 16:01:20 test004 sheep: logger pid 6792 starting<br>Dec 12 16:02:24 test004 sheep: logger pid 6792 stopped<br>Dec 12 16:05:43 test004 sheep: logger pid 6873 starting<br><br>
</div><div>/var/log/syslog<br>Dec 12 16:01:20 test004 sheep: logger pid 6792 starting<br>Dec 12 16:02:01 test004 /USR/SBIN/CRON[6807]: (root) CMD (/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)<br>
Dec 12 16:02:24 test004 sheep: logger pid 6792 stopped<br>Dec 12 16:03:01 test004 /USR/SBIN/CRON[6822]: (root) CMD (/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)<br>Dec 12 16:04:01 test004 /USR/SBIN/CRON[6828]: (root) CMD (/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)<br>
Dec 12 16:05:01 test004 /USR/SBIN/CRON[6835]: (root) CMD (/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)<br>Dec 12 16:05:01 test004 /USR/SBIN/CRON[6836]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)<br>
Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Unloading all Corosync service engines.<br>Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service<br>Dec 12 16:05:27 test004 corosync[3110]: [TOTEM ] A processor joined or left the membership and a new membership was formed.<br>
Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Service engine unloaded: corosync configuration service<br>Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01<br>
Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01<br>Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Service engine unloaded: corosync profile loading service<br>
Dec 12 16:05:27 test004 corosync[3110]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1<br>Dec 12 16:05:27 test004 corosync[3110]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1939.<br>
Dec 12 16:05:41 test004 corosync[6865]: [MAIN ] Corosync Cluster Engine ('1.4.6'): started and ready to provide service.<br>Dec 12 16:05:41 test004 corosync[6865]: [MAIN ] Corosync built-in features:<br>Dec 12 16:05:41 test004 corosync[6865]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.<br>
Dec 12 16:05:41 test004 corosync[6865]: [TOTEM ] Initializing transport (UDP/IP Multicast).<br>Dec 12 16:05:41 test004 corosync[6865]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).<br>
Dec 12 16:05:41 test004 corosync[6865]: [TOTEM ] The network interface [192.168.2.44] is now up.<br>Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service failed to load 'pacemaker'.<br>Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service engine loaded: corosync extended virtual synchrony service<br>
Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service engine loaded: corosync configuration service<br>Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01<br>
Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service engine loaded: corosync cluster config database access v1.01<br>Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service engine loaded: corosync profile loading service<br>
Dec 12 16:05:41 test004 corosync[6865]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1<br>Dec 12 16:05:41 test004 corosync[6865]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.<br>
Dec 12 16:05:41 test004 corosync[6865]: [TOTEM ] A processor joined or left the membership and a new membership was formed.<br>Dec 12 16:05:41 test004 corosync[6865]: [CPG ] chosen downlist: sender r(0) ip(192.168.2.44) ; members(old:0 left:0)<br>
Dec 12 16:05:41 test004 corosync[6865]: [MAIN ] Completed service synchronization, ready to provide service.<br>Dec 12 16:05:43 test004 sheep: logger pid 6873 starting<br>Dec 12 16:05:45 test004 corosync[6865]: [TOTEM ] A processor joined or left the membership and a new membership was formed.<br>
Dec 12 16:05:45 test004 corosync[6865]: [CPG ] chosen downlist: sender r(0) ip(192.168.2.44) ; members(old:1 left:0)<br>Dec 12 16:05:45 test004 corosync[6865]: [MAIN ] Completed service synchronization, ready to provide service.<br>
Dec 12 16:05:48 test004 corosync[6865]: [TOTEM ] A processor joined or left the membership and a new membership was formed.<br>Dec 12 16:05:48 test004 corosync[6865]: [CPG ] chosen downlist: sender r(0) ip(192.168.2.44) ; members(old:2 left:0)<br>
Dec 12 16:05:48 test004 corosync[6865]: [MAIN ] Completed service synchronization, ready to provide service.<br>Dec 12 16:05:52 test004 corosync[6865]: [TOTEM ] A processor joined or left the membership and a new membership was formed.<br>
Dec 12 16:05:52 test004 corosync[6865]: [CPG ] chosen downlist: sender r(0) ip(192.168.2.44) ; members(old:3 left:0)<br>Dec 12 16:05:52 test004 corosync[6865]: [MAIN ] Completed service synchronization, ready to provide service.<br>
</div><div><br><br></div></div>