|
Hi,
Something abnormal happened to my hadoop cluster. Actually the default location of snapshot & dataDir for zookeeper is /var/lib/zookeeper in cdh4. The disk at which /var location is configured became full and the cluster went down (zookeeper & HBase was in ERROR status). I have cleaned /var location but it seems the snapshot & dataDir location of zookeeper is not getting updated & HBase master is not able to connect to zookeeper. We restarted zookeeper and HBase a couple of time. We also stopped zookeeper node one by one to isolate the corrupted node. But seems like all 3 zookeeper nodes got corrupted. It is strange as only one server's disk got filled. Here is the exception, we got -> 2013-01-18 15:17:20,840 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2013-01-18 15:17:20,840 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /168.72.70.92:39880 (no session established for client) 2013-01-18 15:17:20,922 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEPoch), LOOKING (my state) 2013-01-18 15:17:21,123 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEPoch), LOOKING (my state) 2013-01-18 15:17:21,328 INFO org.apache.zookeeper.server.quorum.QuorumPeer: FOLLOWING 2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 60000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2 2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.quorum.Learner: FOLLOWING - LEADER ELECTION TOOK - 829 2013-01-18 15:17:21,330 WARN org.apache.zookeeper.server.quorum.Learner: Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152) at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:272) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) 2013-01-18 15:17:21,330 INFO org.apache.zookeeper.server.quorum.Learner: shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744) 2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.quorum.FollowerZooKeeperServer: Shutting down 2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.ZooKeeperServer: shutting down 2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LOOKING 2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot /var/lib/zookeeper/version-2/snapshot.700000092 2013-01-18 15:17:21,348 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election. My id = 1, proposed zxid=0x7000000f4 2013-01-18 15:17:21,349 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1 (n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 1 (n.sid), 0x7 (n.peerEPoch), LOOKING (my state) 2013-01-18 15:17:21,349 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 4 at election addressdgmstsw001.nam.nsroot.net/168.72.70.89:4181 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393) Appreciate any help in this matter, Thanks, Saurabh. |
| Powered by Nabble | Edit this page |
