Quantcast

zookeeper dataDir corrupted and now hbase can not connect to zookeeper

classic Classic list List threaded Threaded
1 message Options
sam
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

zookeeper dataDir corrupted and now hbase can not connect to zookeeper

sam
Hi,
Something abnormal happened to my hadoop cluster. Actually the default location of snapshot & dataDir for zookeeper is /var/lib/zookeeper in cdh4. The disk at which /var location is configured became full and the cluster went down (zookeeper & HBase was in ERROR status). I have cleaned /var location but it seems the snapshot & dataDir location of zookeeper is not getting updated & HBase master is not able to connect to zookeeper.

We restarted zookeeper and HBase a couple of time.  We also stopped zookeeper node one by one to isolate the corrupted node. But seems like all 3 zookeeper nodes got corrupted. It is strange as only one server's disk got filled.

Here is the exception, we got ->

2013-01-18 15:17:20,840 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2013-01-18 15:17:20,840 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /168.72.70.92:39880 (no session established for client)
2013-01-18 15:17:20,922 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,123 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 3 (n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,328 INFO org.apache.zookeeper.server.quorum.QuorumPeer: FOLLOWING
2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 60000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2013-01-18 15:17:21,329 INFO org.apache.zookeeper.server.quorum.Learner: FOLLOWING - LEADER ELECTION TOOK - 829
2013-01-18 15:17:21,330 WARN org.apache.zookeeper.server.quorum.Learner: Exception when following the leader
java.io.EOFException
       at java.io.DataInputStream.readInt(DataInputStream.java:375)
       at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
       at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
       at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
       at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
       at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:272)
       at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
       at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-01-18 15:17:21,330 INFO org.apache.zookeeper.server.quorum.Learner: shutdown called
java.lang.Exception: shutdown Follower
       at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
       at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.quorum.FollowerZooKeeperServer: Shutting down
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.ZooKeeperServer: shutting down
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LOOKING
2013-01-18 15:17:21,331 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot /var/lib/zookeeper/version-2/snapshot.700000092
2013-01-18 15:17:21,348 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election. My id =  1, proposed zxid=0x7000000f4
2013-01-18 15:17:21,349 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1 (n.leader), 0x7000000f4 (n.zxid), 0x285f (n.round), LOOKING (n.state), 1 (n.sid), 0x7 (n.peerEPoch), LOOKING (my state)
2013-01-18 15:17:21,349 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 4 at election addressdgmstsw001.nam.nsroot.net/168.72.70.89:4181
java.net.ConnectException: Connection refused
       at java.net.PlainSocketImpl.socketConnect(Native Method)
       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
       at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
       at java.net.Socket.connect(Socket.java:529)
       at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
       at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
       at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)

Appreciate any help in this matter,

Thanks,
Saurabh.
Loading...