<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div>Check this please<br><br>dog cluster info<br>Cluster status: running, auto-recovery enabled<br>Cluster created at Fri Jul  4 11:59:27 2014<br>Epoch Time           Version<br>
2014-08-07 21:13:26      9 [<a href="http://192.168.5.23:7000">192.168.5.23:7000</a>, <a href="http://192.168.5.44:7000">192.168.5.44:7000</a>, <a href="http://192.168.5.45:7000">192.168.5.45:7000</a>]<br>2014-08-07 21:13:04      8 [<a href="http://192.168.5.44:7000">192.168.5.44:7000</a>, <a href="http://192.168.5.45:7000">192.168.5.45:7000</a>]<br>
2014-07-08 09:41:32      7 [<a href="http://192.168.5.23:7000">192.168.5.23:7000</a>, <a href="http://192.168.5.44:7000">192.168.5.44:7000</a>, <a href="http://192.168.5.45:7000">192.168.5.45:7000</a>]<br><br></div>As you can see node 192.168.5.23 left the cluster and joined it back 20 seconds after without any manual intervention.<br>
<br></div>sheep.log is empty all of the 3 nodes!<br><br></div>This is part of zookeeper.log of the disconnected node<br><br>2014-08-07 21:12:46,776 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader<br>
java.net.SocketTimeoutException: Read timed out<br>        at java.net.SocketInputStream.socketRead0(Native Method)<br>        at java.net.SocketInputStream.read(SocketInputStream.java:146)<br>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)<br>
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)<br>        at java.io.DataInputStream.readInt(DataInputStream.java:387)<br>        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)<br>
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)<br>        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)<br>        at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)<br>
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78)<br>        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)<br>2014-08-07 21:12:46,998 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@165] - shutdown called<br>
java.lang.Exception: shutdown Follower<br>        at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)<br>        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)<br>2014-08-07 21:12:47,022 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@370] - shutdown of request processor complete<br>
2014-08-07 21:12:47,030 - INFO  [FollowerRequestProcessor:23:FollowerRequestProcessor@93] - FollowerRequestProcessor exited loop!<br>2014-08-07 21:12:47,022 - INFO  [CommitProcessor:23:CommitProcessor@148] - CommitProcessor exited loop!<br>
2014-08-07 21:12:47,058 - INFO  [SyncThread:23:SyncRequestProcessor@151] - SyncRequestProcessor exited!<br>2014-08-07 21:12:47,080 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING<br></div><cut><br>
2014-08-07 21:13:15,328 - INFO  [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 44 (n.leader), 8589934657 (n.zxid), 1 (n.round), FOLLOWING (n.state), 45 (n.sid), FOLLOWING (my state)<br>2014-08-07 21:13:15,328 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@643] - FOLLOWING<br>
2014-08-07 21:13:15,339 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2<br>
2014-08-07 21:13:15,372 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@291] - Getting a diff from the leader 0x3000006cb<br>2014-08-07 21:13:15,373 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@326] - Setting leader epoch 3<br>
2014-08-07 21:13:15,373 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@342] - Got zxid 0x3000006c7 expected 0x1<br>2014-08-07 21:13:15,374 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 3000006cb<br>
2014-08-07 21:13:22,685 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251">0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251</a>] - Accepted socket connection from /<a href="http://192.168.6.23:41799">192.168.6.23:41799</a><br>
2014-08-07 21:13:22,692 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770">0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770</a>] - Client attempting to renew session 0x1746fde1e20b0001 at /<a href="http://192.168.6.23:41799">192.168.6.23:41799</a><br>
2014-08-07 21:13:22,701 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:Learner@103">0.0.0.0/0.0.0.0:2181:Learner@103</a>] - Revalidating client: 1677307057843929089<br>2014-08-07 21:13:22,754 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1573] - Invalid session 0x1746fde1e20b0001 for client /<a href="http://192.168.6.23:41799">192.168.6.23:41799</a>, probably expired<br>
2014-08-07 21:13:22,761 - WARN  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634">0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634</a>] - EndOfStreamException: Unable to read additional data from client sessionid 0x1746fde1e20b0001, likely client has closed socket<br>
2014-08-07 21:13:22,763 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got zxid 0x3000006cc expected 0x1<br>2014-08-07 21:13:22,778 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435">0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435</a>] - Closed socket connection for client /<a href="http://192.168.6.23:41799">192.168.6.23:41799</a> which had sessionid 0x1746fde1e20b0001<br>
2014-08-07 21:18:01,081 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251">0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251</a>] - Accepted socket connection from /<a href="http://192.168.6.2:45462">192.168.6.2:45462</a><br>
2014-08-07 21:18:01,081 - WARN  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634">0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634</a>] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket<br>
<br></div>I think there has been some problems with the switches for unkown reasons.<br></div>What do you think?<br><br></div>Then I noticed a second "problem".<br></div>Zookeeper.log contains 24.000 entries of <br>
<br>2014-08-07 21:58:01,144 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251">0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251</a>] - Accepted socket connection from /<a href="http://192.168.6.2:47943">192.168.6.2:47943</a><br>
2014-08-07 21:58:01,144 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435">0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435</a>] - Closed socket connection for client /<a href="http://192.168.6.2:47943">192.168.6.2:47943</a> (no session established for client)<br>
2014-08-07 22:03:01,153 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251">0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251</a>] - Accepted socket connection from /<a href="http://192.168.6.2:48286">192.168.6.2:48286</a><br>
2014-08-07 22:03:01,153 - INFO  [NIOServerCxn.Factory:<a href="http://0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435">0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435</a>] - Closed socket connection for client /<a href="http://192.168.6.2:48286">192.168.6.2:48286</a> (no session established for client)<br>
<br></div>Why is it trying to connect to this host (192.168.6.2)?<br></div>This host is nor zookeepr neither sheepdog node.<br><br>I checked all /etc/zookeeper/conf/zoo.cfg and there's not 192.168.6.2 anywhere!<br><br>
</div>Thank you.<br><div><br><br><div><div><div><div><div><div><div><br><div><div><br><br></div></div></div></div></div></div></div></div></div></div></div>