[sheepdog-users] [sheep-users] Node exit and join the cluster with no log

Valerio Pachera sirio81 at gmail.com
Thu Aug 7 22:09:10 CEST 2014


Check this please

dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Fri Jul  4 11:59:27 2014
Epoch Time           Version
2014-08-07 21:13:26      9 [192.168.5.23:7000, 192.168.5.44:7000,
192.168.5.45:7000]
2014-08-07 21:13:04      8 [192.168.5.44:7000, 192.168.5.45:7000]
2014-07-08 09:41:32      7 [192.168.5.23:7000, 192.168.5.44:7000,
192.168.5.45:7000]

As you can see node 192.168.5.23 left the cluster and joined it back 20
seconds after without any manual intervention.

sheep.log is empty all of the 3 nodes!

This is part of zookeeper.log of the disconnected node

2014-08-07 21:12:46,776 - WARN
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower at 82] - Exception when following
the leader
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:146)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
        at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
        at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
2014-08-07 21:12:46,998 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower at 165] - shutdown called
java.lang.Exception: shutdown Follower
        at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
        at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
2014-08-07 21:12:47,022 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor at 370] - shutdown of
request processor complete
2014-08-07 21:12:47,030 - INFO
[FollowerRequestProcessor:23:FollowerRequestProcessor at 93] -
FollowerRequestProcessor exited loop!
2014-08-07 21:12:47,022 - INFO  [CommitProcessor:23:CommitProcessor at 148] -
CommitProcessor exited loop!
2014-08-07 21:12:47,058 - INFO  [SyncThread:23:SyncRequestProcessor at 151] -
SyncRequestProcessor exited!
2014-08-07 21:12:47,080 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer at 621] - LOOKING
<cut>
2014-08-07 21:13:15,328 - INFO  [WorkerReceiver
Thread:FastLeaderElection at 496] - Notification: 44 (n.leader), 8589934657
(n.zxid), 1 (n.round), FOLLOWING (n.state), 45 (n.sid), FOLLOWING (my state)
2014-08-07 21:13:15,328 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer at 643] - FOLLOWING
2014-08-07 21:13:15,339 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer at 154] - Created server
with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
/var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2014-08-07 21:13:15,372 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner at 291] - Getting a diff from the
leader 0x3000006cb
2014-08-07 21:13:15,373 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner at 326] - Setting leader epoch 3
2014-08-07 21:13:15,373 - WARN
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner at 342] - Got zxid 0x3000006c7
expected 0x1
2014-08-07 21:13:15,374 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog at 254] - Snapshotting:
3000006cb
2014-08-07 21:13:22,685 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory at 251] - Accepted socket
connection from /192.168.6.23:41799
2014-08-07 21:13:22,692 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn at 770] - Client attempting to renew
session 0x1746fde1e20b0001 at /192.168.6.23:41799
2014-08-07 21:13:22,701 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:Learner at 103] - Revalidating client: 1677307057843929089
2014-08-07 21:13:22,754 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn at 1573] - Invalid session
0x1746fde1e20b0001 for client /192.168.6.23:41799, probably expired
2014-08-07 21:13:22,761 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn at 634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x1746fde1e20b0001, likely
client has closed socket
2014-08-07 21:13:22,763 - WARN
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower at 116] - Got zxid 0x3000006cc
expected 0x1
2014-08-07 21:13:22,778 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn at 1435] - Closed socket connection for
client /192.168.6.23:41799 which had sessionid 0x1746fde1e20b0001
2014-08-07 21:18:01,081 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory at 251] - Accepted socket
connection from /192.168.6.2:45462
2014-08-07 21:18:01,081 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn at 634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x0, likely client has closed
socket

I think there has been some problems with the switches for unkown reasons.
What do you think?

Then I noticed a second "problem".
Zookeeper.log contains 24.000 entries of

2014-08-07 21:58:01,144 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory at 251] - Accepted socket
connection from /192.168.6.2:47943
2014-08-07 21:58:01,144 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn at 1435] - Closed socket connection for
client /192.168.6.2:47943 (no session established for client)
2014-08-07 22:03:01,153 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory at 251] - Accepted socket
connection from /192.168.6.2:48286
2014-08-07 22:03:01,153 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn at 1435] - Closed socket connection for
client /192.168.6.2:48286 (no session established for client)

Why is it trying to connect to this host (192.168.6.2)?
This host is nor zookeepr neither sheepdog node.

I checked all /etc/zookeeper/conf/zoo.cfg and there's not 192.168.6.2
anywhere!

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140807/7658b66a/attachment-0004.html>


More information about the sheepdog-users mailing list