ZooKeeper Cluster Failed (and did not recover)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

ZooKeeper Cluster Failed (and did not recover)

Jim Wong
Hey, folks.  We’re running a 5-node ZooKeeper cluster to manage server configuration and discovery.  This morning, we experienced an outage in which the entire cluster went offline for approximately 7 minutes.  Restarting the ZooKeeper daemons on each node in the cluster got things working again, so there doesn’t appear to have been an underlying network or hardware fault that would explain the cluster’s inability to recover.

This is actually the second time this has happened in the last several months, and all of our attempts to reproduce the behavior by intentionally doing bad things to the cluster or the leader have failed.

With all that in mind, we were wondering if anyone had any thoughts as to what might be going on, as we’re out of ideas.  If anyone can provide any hints or guidance, we’d be very grateful.

I’ve included brief excerpts from logs from some of the nodes; the problem started around 17:32:12 and wasn’t resolved until 17:39, after we’d restarted ZooKeeper.  This was was originally the leader:

Nov 29 19:31:44 localhost zookeeper: INFO  [ProcessThread(sid:4 cport:-1)::PrepRequestProcessor@486] - Processed session termination for sessionid: 0x15fdd1d1e0c0313
Nov 29 19:32:11 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:11 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:11 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:12 localhost zookeeper: ERROR [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
Nov 29 19:32:12 localhost zookeeper: WARN  [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE /10.1.29.76:49198 ********
Nov 29 19:32:12 localhost zookeeper: ERROR [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
Nov 29 19:32:12 localhost zookeeper: ERROR [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
Nov 29 19:32:12 localhost zookeeper: WARN  [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE /10.1.0.252:50932 ********
Nov 29 19:32:12 localhost zookeeper: WARN  [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE /10.1.48.67:44882 ********
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my state)
Nov 29 19:32:12 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:12 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:12 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my state)
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called
Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 3,4 ]
Nov 29 19:32:12 localhost     at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
Nov 29 19:32:12 localhost     at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
Nov 29 19:32:12 localhost zookeeper: INFO  [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331] - exception while shutting down acceptor: java.net.SocketException: Socket closed
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed socket connection for client /10.1.10.8:42938 which had sessionid 0x45f3a424dd20208
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [ProcessThread(sid:4 cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - shutdown of request processor complete
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited!
Nov 29 19:32:12 localhost zookeeper: INFO  [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop!
Nov 29 19:32:12 localhost zookeeper: WARN  [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE /10.1.37.117:52956 ********
Nov 29 19:32:12 localhost zookeeper: WARN  [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring unexpected exception
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New election. My id =  4, proposed zxid=0x2b0260b5a8
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to 10.1.64.1 port 67 (xid=0x79592ad4)
Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1 (xid=0x79592ad4)
Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 -- renewal in 1608 seconds.
Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s
Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases of eth0
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 400
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:13 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:13 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 800
Nov 29 19:32:13 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:13 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:13 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:14 localhost zookeeper: INFO  [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
Nov 29 19:32:14 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 1600
Nov 29 19:32:14 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:15 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 3200
Nov 29 19:32:15 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:18 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 6400
Nov 29 19:32:18 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0, interval 125280ms.
Nov 29 19:32:25 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 12800
Nov 29 19:32:25 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:38 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 25600
Nov 29 19:32:38 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:33:03 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 51200
Nov 29 19:33:03 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:33:54 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:33:54 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:34:54 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:34:54 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:35:54 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:35:54 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:36:54 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:36:54 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:37:54 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:37:54 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:38:54 localhost zookeeper: INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:38:54 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 localhost zookeeper: WARN  [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for id 5, my id = 4, error =
Nov 29 19:39:45 localhost zookeeper: WARN  [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker
Nov 29 19:39:45 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times
Nov 29 19:39:45 localhost zookeeper: WARN  [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using channel: for id 5 my id = 4 error = java.net.SocketException: Socket closed
Nov 29 19:39:45 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times
Nov 29 19:39:45 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 localhost zookeeper: WARN  [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving thread
Nov 29 19:39:47 localhost zookeeper: INFO  [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746] - Received connection request /10.1.48.67:58398
Nov 29 19:39:47 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:47 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:47 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:47 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:47 localhost zookeeper: INFO  [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service 'zookeeper-server' on user request


And here are logs from one of the followers:

Nov 29 19:32:11 localhost zookeeper: INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x15fc2e9562e0187
Nov 29 19:32:12 localhost zookeeper: WARN  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed socket connection for client /10.1.29.33:57934 which had sessionid 0x35f3a424dc802a2
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [FollowerRequestProcessor:5:FollowerRequestProcessor@97] - FollowerRequestProcessor exited loop!
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - shutdown of request processor complete
Nov 29 19:32:12 localhost zookeeper: INFO  [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop!
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - Shutting down
Nov 29 19:32:12 localhost zookeeper: INFO  [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited!
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New election. My id =  5, proposed zxid=0x2b0260b5a8
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:12 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 400
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:12 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 800
Nov 29 19:32:12 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:13 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 1600
Nov 29 19:32:13 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:15 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 3200
Nov 29 19:32:15 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:18 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 6400
Nov 29 19:32:18 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:24 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 12800
Nov 29 19:32:24 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:32:37 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 25600
Nov 29 19:32:37 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:33:03 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 51200
Nov 29 19:33:03 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:33:54 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:33:54 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:34:42 localhost zookeeper: WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
Nov 29 19:34:54 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:34:54 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:35:54 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:35:54 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:36:54 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:36:54 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:37:54 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:37:54 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:38:54 localhost zookeeper: INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
Nov 29 19:38:54 localhost zookeeper: INFO  [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1 (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my state)
Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service 'zookeeper-server' on user request


Reply | Threaded
Open this post in threaded view
|

Re: ZooKeeper Cluster Failed (and did not recover)

Abraham Fine
Hi Jim-

What version of ZooKeeper are you using?

It would be great to get a better understanding of what is causing:
Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
Not sufficient followers synced, only synced with sids: [ 3,4 ]
Nov 29 19:32:12 localhost     at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
Nov 29 19:32:12 localhost     at
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)

This failure appears to align with:
Nov 29 19:32:12 localhost zookeeper: WARN
[QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader

Is there any possibility debug logs could be enabled?

Thanks,
Abe

Would you be able to turn on debug logging, I think that might give us
much more visibility?
On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote:

> Hey, folks.  We’re running a 5-node ZooKeeper cluster to manage server
> configuration and discovery.  This morning, we experienced an outage in
> which the entire cluster went offline for approximately 7 minutes.
> Restarting the ZooKeeper daemons on each node in the cluster got things
> working again, so there doesn’t appear to have been an underlying network
> or hardware fault that would explain the cluster’s inability to recover.
>
> This is actually the second time this has happened in the last several
> months, and all of our attempts to reproduce the behavior by
> intentionally doing bad things to the cluster or the leader have failed.
>
> With all that in mind, we were wondering if anyone had any thoughts as to
> what might be going on, as we’re out of ideas.  If anyone can provide any
> hints or guidance, we’d be very grateful.
>
> I’ve included brief excerpts from logs from some of the nodes; the
> problem started around 17:32:12 and wasn’t resolved until 17:39, after
> we’d restarted ZooKeeper.  This was was originally the leader:
>
> Nov 29 19:31:44 localhost zookeeper: INFO  [ProcessThread(sid:4
> cport:-1)::PrepRequestProcessor@486] - Processed session termination for
> sessionid: 0x15fdd1d1e0c0313
> Nov 29 19:32:11 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:11 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:11 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:12 localhost zookeeper: ERROR
> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected
> exception causing shutdown while sock still open
> Nov 29 19:32:12 localhost zookeeper: WARN
> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE
> /10.1.29.76:49198 ********
> Nov 29 19:32:12 localhost zookeeper: ERROR
> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected
> exception causing shutdown while sock still open
> Nov 29 19:32:12 localhost zookeeper: ERROR
> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected
> exception causing shutdown while sock still open
> Nov 29 19:32:12 localhost zookeeper: WARN
> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE
> /10.1.0.252:50932 ********
> Nov 29 19:32:12 localhost zookeeper: WARN
> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE
> /10.1.48.67:44882 ********
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
> state)
> Nov 29 19:32:12 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:12 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:12 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
> state)
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called
> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> Not sufficient followers synced, only synced with sids: [ 3,4 ]
> Nov 29 19:32:12 localhost     at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> Nov 29 19:32:12 localhost     at
> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
> Nov 29 19:32:12 localhost zookeeper: INFO
> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331]
> - exception while shutting down acceptor: java.net.SocketException:
> Socket closed
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
> socket connection for client /10.1.10.8:42938 which had sessionid
> 0x45f3a424dd20208
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
> down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO  [ProcessThread(sid:4
> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
> down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676]
> - Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
> shutdown of request processor complete
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited!
> Nov 29 19:32:12 localhost zookeeper: INFO
> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop!
> Nov 29 19:32:12 localhost zookeeper: WARN
> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE
> /10.1.37.117:52956 ********
> Nov 29 19:32:12 localhost zookeeper: WARN
> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring
> unexpected exception
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
> election. My id =  4, proposed zxid=0x2b0260b5a8
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to
> 10.1.64.1 port 67 (xid=0x79592ad4)
> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1
> (xid=0x79592ad4)
> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 --
> renewal in 1608 seconds.
> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get
> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s
> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases
> of eth0
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 400
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:13 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:13 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 800
> Nov 29 19:32:13 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:13 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:13 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:14 localhost zookeeper: INFO
> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
> Nov 29 19:32:14 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 1600
> Nov 29 19:32:14 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:15 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 3200
> Nov 29 19:32:15 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:18 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 6400
> Nov 29 19:32:18 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0,
> interval 125280ms.
> Nov 29 19:32:25 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 12800
> Nov 29 19:32:25 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:38 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 25600
> Nov 29 19:32:38 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:33:03 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 51200
> Nov 29 19:33:03 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:33:54 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:33:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:34:54 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:34:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:35:54 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:35:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:36:54 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:36:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:37:54 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:37:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:38:54 localhost zookeeper: INFO
> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:38:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 localhost zookeeper: WARN
> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for
> id 5, my id = 4, error =
> Nov 29 19:39:45 localhost zookeeper: WARN
> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker
> Nov 29 19:39:45 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times
> Nov 29 19:39:45 localhost zookeeper: WARN
> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using
> channel: for id 5 my id = 4 error = java.net.SocketException: Socket
> closed
> Nov 29 19:39:45 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times
> Nov 29 19:39:45 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 localhost zookeeper: WARN
> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving
> thread
> Nov 29 19:39:47 localhost zookeeper: INFO
> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746]
> - Received connection request /10.1.48.67:58398
> Nov 29 19:39:47 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:47 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:47 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:47 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:47 localhost zookeeper: INFO
> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service
> 'zookeeper-server' on user request
>
>
> And here are logs from one of the followers:
>
> Nov 29 19:32:11 localhost zookeeper: INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
> client: 0x15fc2e9562e0187
> Nov 29 19:32:12 localhost zookeeper: WARN
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
> socket connection for client /10.1.29.33:57934 which had sessionid
> 0x35f3a424dc802a2
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
> down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
> down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] -
> FollowerRequestProcessor exited loop!
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
> shutdown of request processor complete
> Nov 29 19:32:12 localhost zookeeper: INFO
> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop!
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
> Shutting down
> Nov 29 19:32:12 localhost zookeeper: INFO
> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited!
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
> election. My id =  5, proposed zxid=0x2b0260b5a8
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:12 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 400
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:12 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 800
> Nov 29 19:32:12 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:13 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 1600
> Nov 29 19:32:13 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:15 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 3200
> Nov 29 19:32:15 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:18 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 6400
> Nov 29 19:32:18 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:24 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 12800
> Nov 29 19:32:24 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:32:37 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 25600
> Nov 29 19:32:37 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:33:03 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 51200
> Nov 29 19:33:03 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:33:54 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:33:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:34:42 localhost zookeeper: WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> end of stream exception
> Nov 29 19:34:54 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:34:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:35:54 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:35:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:36:54 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:36:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:37:54 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:37:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:38:54 localhost zookeeper: INFO
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> Notification time out: 60000
> Nov 29 19:38:54 localhost zookeeper: INFO
> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> state)
> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service
> 'zookeeper-server' on user request
>
>
Reply | Threaded
Open this post in threaded view
|

Re: ZooKeeper Cluster Failed (and did not recover)

Jim Wong
Abe,

Thanks for responding. We are currently on ZooKeeper 3.4.10; do you think going to 3.4.11 would make a difference?

We can certainly turn on debug logs, but because the problem has only occurred twice in the last several (probably 8-9) months and we’re unable to reproduce it intentionally, it might take a while.

Is there anything else you suggest we do to troubleshoot in the meantime?


> On Nov 29, 2017, at 4:52 PM, Abraham Fine <[hidden email]> wrote:
>
> Hi Jim-
>
> What version of ZooKeeper are you using?
>
> It would be great to get a better understanding of what is causing:
> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> Not sufficient followers synced, only synced with sids: [ 3,4 ]
> Nov 29 19:32:12 localhost     at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> Nov 29 19:32:12 localhost     at
> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
>
> This failure appears to align with:
> Nov 29 19:32:12 localhost zookeeper: WARN
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
>
> Is there any possibility debug logs could be enabled?
>
> Thanks,
> Abe
>
> Would you be able to turn on debug logging, I think that might give us
> much more visibility?
> On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote:
>> Hey, folks.  We’re running a 5-node ZooKeeper cluster to manage server
>> configuration and discovery.  This morning, we experienced an outage in
>> which the entire cluster went offline for approximately 7 minutes.
>> Restarting the ZooKeeper daemons on each node in the cluster got things
>> working again, so there doesn’t appear to have been an underlying network
>> or hardware fault that would explain the cluster’s inability to recover.
>>
>> This is actually the second time this has happened in the last several
>> months, and all of our attempts to reproduce the behavior by
>> intentionally doing bad things to the cluster or the leader have failed.
>>
>> With all that in mind, we were wondering if anyone had any thoughts as to
>> what might be going on, as we’re out of ideas.  If anyone can provide any
>> hints or guidance, we’d be very grateful.
>>
>> I’ve included brief excerpts from logs from some of the nodes; the
>> problem started around 17:32:12 and wasn’t resolved until 17:39, after
>> we’d restarted ZooKeeper.  This was was originally the leader:
>>
>> Nov 29 19:31:44 localhost zookeeper: INFO  [ProcessThread(sid:4
>> cport:-1)::PrepRequestProcessor@486] - Processed session termination for
>> sessionid: 0x15fdd1d1e0c0313
>> Nov 29 19:32:11 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:11 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:11 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: ERROR
>> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected
>> exception causing shutdown while sock still open
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE
>> /10.1.29.76:49198 ********
>> Nov 29 19:32:12 localhost zookeeper: ERROR
>> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected
>> exception causing shutdown while sock still open
>> Nov 29 19:32:12 localhost zookeeper: ERROR
>> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected
>> exception causing shutdown while sock still open
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE
>> /10.1.0.252:50932 ********
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE
>> /10.1.48.67:44882 ********
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called
>> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
>> Not sufficient followers synced, only synced with sids: [ 3,4 ]
>> Nov 29 19:32:12 localhost     at
>> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
>> Nov 29 19:32:12 localhost     at
>> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331]
>> - exception while shutting down acceptor: java.net.SocketException:
>> Socket closed
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
>> socket connection for client /10.1.10.8:42938 which had sessionid
>> 0x45f3a424dd20208
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO  [ProcessThread(sid:4
>> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676]
>> - Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
>> shutdown of request processor complete
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited!
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE
>> /10.1.37.117:52956 ********
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring
>> unexpected exception
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
>> election. My id =  4, proposed zxid=0x2b0260b5a8
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to
>> 10.1.64.1 port 67 (xid=0x79592ad4)
>> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1
>> (xid=0x79592ad4)
>> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 --
>> renewal in 1608 seconds.
>> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get
>> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s
>> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases
>> of eth0
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 400
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:13 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:13 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 800
>> Nov 29 19:32:13 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:13 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:13 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:14 localhost zookeeper: INFO
>> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
>> Nov 29 19:32:14 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 1600
>> Nov 29 19:32:14 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:15 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 3200
>> Nov 29 19:32:15 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:18 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 6400
>> Nov 29 19:32:18 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0,
>> interval 125280ms.
>> Nov 29 19:32:25 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 12800
>> Nov 29 19:32:25 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:38 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 25600
>> Nov 29 19:32:38 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:03 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 51200
>> Nov 29 19:33:03 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:33:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:34:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:34:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:35:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:35:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:36:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:36:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:37:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:37:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:38:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:38:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: WARN
>> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for
>> id 5, my id = 4, error =
>> Nov 29 19:39:45 localhost zookeeper: WARN
>> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker
>> Nov 29 19:39:45 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times
>> Nov 29 19:39:45 localhost zookeeper: WARN
>> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using
>> channel: for id 5 my id = 4 error = java.net.SocketException: Socket
>> closed
>> Nov 29 19:39:45 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times
>> Nov 29 19:39:45 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: WARN
>> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving
>> thread
>> Nov 29 19:39:47 localhost zookeeper: INFO
>> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746]
>> - Received connection request /10.1.48.67:58398
>> Nov 29 19:39:47 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service
>> 'zookeeper-server' on user request
>>
>>
>> And here are logs from one of the followers:
>>
>> Nov 29 19:32:11 localhost zookeeper: INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
>> client: 0x15fc2e9562e0187
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>> following the leader
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
>> socket connection for client /10.1.29.33:57934 which had sessionid
>> 0x35f3a424dc802a2
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] -
>> FollowerRequestProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
>> shutdown of request processor complete
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited!
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
>> election. My id =  5, proposed zxid=0x2b0260b5a8
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 400
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 800
>> Nov 29 19:32:12 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:13 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 1600
>> Nov 29 19:32:13 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:15 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 3200
>> Nov 29 19:32:15 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:18 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 6400
>> Nov 29 19:32:18 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:24 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 12800
>> Nov 29 19:32:24 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:37 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 25600
>> Nov 29 19:32:37 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:03 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 51200
>> Nov 29 19:33:03 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:33:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:34:42 localhost zookeeper: WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:34:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:34:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:35:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:35:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:36:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:36:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:37:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:37:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:38:54 localhost zookeeper: INFO
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:38:54 localhost zookeeper: INFO
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service
>> 'zookeeper-server' on user request
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: ZooKeeper Cluster Failed (and did not recover)

Abraham Fine
Hi Jim-

I don't think moving to 3.4.11 should make too much of a difference, it
just helps to know the version so I can map the log messages to code.

There are a couple things you could look at in the meantime. I would
start with heap usage to make sure this is not the result of GC pauses.
You may also want to reevaluate the initLimit if you are storing a large
amount of data in ZK.

Hopefully this helps.

Abe

On Thu, Nov 30, 2017, at 10:15, Jim Wong wrote:

> Abe,
>
> Thanks for responding. We are currently on ZooKeeper 3.4.10; do you think
> going to 3.4.11 would make a difference?
>
> We can certainly turn on debug logs, but because the problem has only
> occurred twice in the last several (probably 8-9) months and we’re unable
> to reproduce it intentionally, it might take a while.
>
> Is there anything else you suggest we do to troubleshoot in the meantime?
>
>
> > On Nov 29, 2017, at 4:52 PM, Abraham Fine <[hidden email]> wrote:
> >
> > Hi Jim-
> >
> > What version of ZooKeeper are you using?
> >
> > It would be great to get a better understanding of what is causing:
> > Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> > Not sufficient followers synced, only synced with sids: [ 3,4 ]
> > Nov 29 19:32:12 localhost     at
> > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> > Nov 29 19:32:12 localhost     at
> > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
> >
> > This failure appears to align with:
> > Nov 29 19:32:12 localhost zookeeper: WARN
> > [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> > following the leader
> >
> > Is there any possibility debug logs could be enabled?
> >
> > Thanks,
> > Abe
> >
> > Would you be able to turn on debug logging, I think that might give us
> > much more visibility?
> > On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote:
> >> Hey, folks.  We’re running a 5-node ZooKeeper cluster to manage server
> >> configuration and discovery.  This morning, we experienced an outage in
> >> which the entire cluster went offline for approximately 7 minutes.
> >> Restarting the ZooKeeper daemons on each node in the cluster got things
> >> working again, so there doesn’t appear to have been an underlying network
> >> or hardware fault that would explain the cluster’s inability to recover.
> >>
> >> This is actually the second time this has happened in the last several
> >> months, and all of our attempts to reproduce the behavior by
> >> intentionally doing bad things to the cluster or the leader have failed.
> >>
> >> With all that in mind, we were wondering if anyone had any thoughts as to
> >> what might be going on, as we’re out of ideas.  If anyone can provide any
> >> hints or guidance, we’d be very grateful.
> >>
> >> I’ve included brief excerpts from logs from some of the nodes; the
> >> problem started around 17:32:12 and wasn’t resolved until 17:39, after
> >> we’d restarted ZooKeeper.  This was was originally the leader:
> >>
> >> Nov 29 19:31:44 localhost zookeeper: INFO  [ProcessThread(sid:4
> >> cport:-1)::PrepRequestProcessor@486] - Processed session termination for
> >> sessionid: 0x15fdd1d1e0c0313
> >> Nov 29 19:32:11 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:11 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:11 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: ERROR
> >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected
> >> exception causing shutdown while sock still open
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.29.76:49198 ********
> >> Nov 29 19:32:12 localhost zookeeper: ERROR
> >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected
> >> exception causing shutdown while sock still open
> >> Nov 29 19:32:12 localhost zookeeper: ERROR
> >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected
> >> exception causing shutdown while sock still open
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.0.252:50932 ********
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.48.67:44882 ********
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called
> >> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> >> Not sufficient followers synced, only synced with sids: [ 3,4 ]
> >> Nov 29 19:32:12 localhost     at
> >> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> >> Nov 29 19:32:12 localhost     at
> >> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331]
> >> - exception while shutting down acceptor: java.net.SocketException:
> >> Socket closed
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
> >> socket connection for client /10.1.10.8:42938 which had sessionid
> >> 0x45f3a424dd20208
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO  [ProcessThread(sid:4
> >> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676]
> >> - Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
> >> shutdown of request processor complete
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited!
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.37.117:52956 ********
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring
> >> unexpected exception
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
> >> election. My id =  4, proposed zxid=0x2b0260b5a8
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to
> >> 10.1.64.1 port 67 (xid=0x79592ad4)
> >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1
> >> (xid=0x79592ad4)
> >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 --
> >> renewal in 1608 seconds.
> >> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get
> >> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s
> >> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases
> >> of eth0
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 400
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:13 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:13 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 800
> >> Nov 29 19:32:13 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:13 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:13 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:14 localhost zookeeper: INFO
> >> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
> >> Nov 29 19:32:14 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 1600
> >> Nov 29 19:32:14 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:15 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 3200
> >> Nov 29 19:32:15 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:18 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 6400
> >> Nov 29 19:32:18 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0,
> >> interval 125280ms.
> >> Nov 29 19:32:25 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 12800
> >> Nov 29 19:32:25 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:38 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 25600
> >> Nov 29 19:32:38 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:03 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 51200
> >> Nov 29 19:33:03 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:33:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:34:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:34:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:35:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:35:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:36:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:36:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:37:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:37:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:38:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:38:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: WARN
> >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for
> >> id 5, my id = 4, error =
> >> Nov 29 19:39:45 localhost zookeeper: WARN
> >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker
> >> Nov 29 19:39:45 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times
> >> Nov 29 19:39:45 localhost zookeeper: WARN
> >> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using
> >> channel: for id 5 my id = 4 error = java.net.SocketException: Socket
> >> closed
> >> Nov 29 19:39:45 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times
> >> Nov 29 19:39:45 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: WARN
> >> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving
> >> thread
> >> Nov 29 19:39:47 localhost zookeeper: INFO
> >> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746]
> >> - Received connection request /10.1.48.67:58398
> >> Nov 29 19:39:47 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service
> >> 'zookeeper-server' on user request
> >>
> >>
> >> And here are logs from one of the followers:
> >>
> >> Nov 29 19:32:11 localhost zookeeper: INFO
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
> >> client: 0x15fc2e9562e0187
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >> following the leader
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
> >> socket connection for client /10.1.29.33:57934 which had sessionid
> >> 0x35f3a424dc802a2
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] -
> >> FollowerRequestProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
> >> shutdown of request processor complete
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited!
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
> >> election. My id =  5, proposed zxid=0x2b0260b5a8
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 400
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 800
> >> Nov 29 19:32:12 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:13 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 1600
> >> Nov 29 19:32:13 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:15 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 3200
> >> Nov 29 19:32:15 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:18 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 6400
> >> Nov 29 19:32:18 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:24 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 12800
> >> Nov 29 19:32:24 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:37 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 25600
> >> Nov 29 19:32:37 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:03 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 51200
> >> Nov 29 19:33:03 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:33:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:34:42 localhost zookeeper: WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:34:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:34:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:35:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:35:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:36:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:36:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:37:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:37:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:38:54 localhost zookeeper: INFO
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:38:54 localhost zookeeper: INFO
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service
> >> 'zookeeper-server' on user request
> >>
> >>
>