Error connecting to ZooKeeper server

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Error connecting to ZooKeeper server

Michael Chen
Hi,

I've run into a ZooKeeper connection error during the execution of a
Nutch hadoop job. The tasks stall on connection error to ZooKeeper
server. Here's what I know:

1. ZK connection error is the only known problem, other logs report no issue

2. Error message on YARN NodeManager on one of the slaves is:

2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

The connection keeps failing until it hits the 10min limit and the task
fails.

3. ZooKeeper Server is deployed only on master

4. Cluster managed by CloudEra Manager 5.12.

Could a configuration on Nutch side or CloudEra Manager side be missing?
There are no ZK servers on the slaves and the NodeManager should be
connecting to the ZK server on the master, instead of localhost:2181.

Any suggestion or help is greatly appreciated!

Thank you,

Michael

Reply | Threaded
Open this post in threaded view
|

Re: Error connecting to ZooKeeper server

Michael Chen
Also, the cluster is on AWS. Security group set to allow all inbound and
outbound traffic...

Any ideas?...


On 08/16/2017 12:37 PM, Michael Chen wrote:

>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be
> missing? There are no ZK servers on the slaves and the NodeManager
> should be connecting to the ZK server on the master, instead of
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>

Reply | Threaded
Open this post in threaded view
|

Re: Error connecting to ZooKeeper server

Martin Gainty



________________________________
From: Michael Chen <[hidden email]>
Sent: Wednesday, August 16, 2017 3:47 PM
To: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Error connecting to ZooKeeper server

Also, the cluster is on AWS. Security group set to allow all inbound and
outbound traffic...
MG>can you verify ALL inbound ports and ALL outbound ports are enabled and listening with netstat -lpn

Any ideas?...

MG>to eliminate AWS as the culprit what happens when you disable the problematic AWS Security Group?
https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
[http://www.google.com/images/icons/product/groups-128.png]<https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>

AWS Security Group settings for Chronos Cluster<https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
groups.google.com
Posted 9/22/14 9:04 AM, 3 messages





On 08/16/2017 12:37 PM, Michael Chen wrote:

>
> Hi,
>
> I've run into a ZooKeeper connection error during the execution of a
> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
> server. Here's what I know:
>
> 1. ZK connection error is the only known problem, other logs report no
> issue
>
> 2. Error message on YARN NodeManager on one of the slaves is:
>
> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>
> The connection keeps failing until it hits the 10min limit and the
> task fails.
>
> 3. ZooKeeper Server is deployed only on master
>
> 4. Cluster managed by CloudEra Manager 5.12.
>
> Could a configuration on Nutch side or CloudEra Manager side be
> missing? There are no ZK servers on the slaves and the NodeManager
> should be connecting to the ZK server on the master, instead of
> localhost:2181.
>
> Any suggestion or help is greatly appreciated!
>
> Thank you,
>
> Michael
>

Reply | Threaded
Open this post in threaded view
|

Re: Error connecting to ZooKeeper server

Dan Benediktson
Given that it's trying to connect to localhost:2181, and that it's expected
to connect to a remote machine, and that the error is "Connection refused"
(meaning almost certainly either a firewall rejected or there was no
process listening on that TCP port, but given that it's localhost, pretty
much has to be the latter), that there must be some simple configuration
problem on the side of whatever is talking to Zookeeper. Not to say you
won't have firewall problems after you resolve that, but first things
first: configure it so it's actually talking to the ZK ensemble.

On Wed, Aug 16, 2017 at 4:14 PM, Martin Gainty <[hidden email]> wrote:

>
>
>
> ________________________________
> From: Michael Chen <[hidden email]>
> Sent: Wednesday, August 16, 2017 3:47 PM
> To: [hidden email]; [hidden email];
> [hidden email]
> Subject: Re: Error connecting to ZooKeeper server
>
> Also, the cluster is on AWS. Security group set to allow all inbound and
> outbound traffic...
> MG>can you verify ALL inbound ports and ALL outbound ports are enabled and
> listening with netstat -lpn
>
> Any ideas?...
>
> MG>to eliminate AWS as the culprit what happens when you disable the
> problematic AWS Security Group?
> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
> [http://www.google.com/images/icons/product/groups-128.png]<
> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>
> AWS Security Group settings for Chronos Cluster<https://groups.google.
> com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
> groups.google.com
> Posted 9/22/14 9:04 AM, 3 messages
>
>
>
>
>
> On 08/16/2017 12:37 PM, Michael Chen wrote:
> >
> > Hi,
> >
> > I've run into a ZooKeeper connection error during the execution of a
> > Nutch hadoop job. The tasks stall on connection error to ZooKeeper
> > server. Here's what I know:
> >
> > 1. ZK connection error is the only known problem, other logs report no
> > issue
> >
> > 2. Error message on YARN NodeManager on one of the slaves is:
> >
> > 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)]
> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> > 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)]
> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
> error, closing socket connection and attempting reconnect
> > java.net.ConnectException: Connection refused
> >
> > The connection keeps failing until it hits the 10min limit and the
> > task fails.
> >
> > 3. ZooKeeper Server is deployed only on master
> >
> > 4. Cluster managed by CloudEra Manager 5.12.
> >
> > Could a configuration on Nutch side or CloudEra Manager side be
> > missing? There are no ZK servers on the slaves and the NodeManager
> > should be connecting to the ZK server on the master, instead of
> > localhost:2181.
> >
> > Any suggestion or help is greatly appreciated!
> >
> > Thank you,
> >
> > Michael
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Error connecting to ZooKeeper server

Michael Chen
Thanks for the reply! The firewall is disabled and ZK is running fine per CloudEra Manager.

I might have fixed the problem by including relevant properties (ZK quorum, distributedmode) in the hbase-site.xml and nutch-site.xml... unsure if I should also include other properties/settings...?

Thanks,
Michael

> On Aug 16, 2017, at 17:54, Dan Benediktson <[hidden email]> wrote:
>
> Given that it's trying to connect to localhost:2181, and that it's expected
> to connect to a remote machine, and that the error is "Connection refused"
> (meaning almost certainly either a firewall rejected or there was no
> process listening on that TCP port, but given that it's localhost, pretty
> much has to be the latter), that there must be some simple configuration
> problem on the side of whatever is talking to Zookeeper. Not to say you
> won't have firewall problems after you resolve that, but first things
> first: configure it so it's actually talking to the ZK ensemble.
>
>> On Wed, Aug 16, 2017 at 4:14 PM, Martin Gainty <[hidden email]> wrote:
>>
>>
>>
>>
>> ________________________________
>> From: Michael Chen <[hidden email]>
>> Sent: Wednesday, August 16, 2017 3:47 PM
>> To: [hidden email]; [hidden email];
>> [hidden email]
>> Subject: Re: Error connecting to ZooKeeper server
>>
>> Also, the cluster is on AWS. Security group set to allow all inbound and
>> outbound traffic...
>> MG>can you verify ALL inbound ports and ALL outbound ports are enabled and
>> listening with netstat -lpn
>>
>> Any ideas?...
>>
>> MG>to eliminate AWS as the culprit what happens when you disable the
>> problematic AWS Security Group?
>> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ
>> [http://www.google.com/images/icons/product/groups-128.png]<
>> https://groups.google.com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>>
>> AWS Security Group settings for Chronos Cluster<https://groups.google.
>> com/forum/#!topic/chronos-scheduler/ys77mol0aWQ>
>> groups.google.com
>> Posted 9/22/14 9:04 AM, 3 messages
>>
>>
>>
>>
>>
>>> On 08/16/2017 12:37 PM, Michael Chen wrote:
>>>
>>> Hi,
>>>
>>> I've run into a ZooKeeper connection error during the execution of a
>>> Nutch hadoop job. The tasks stall on connection error to ZooKeeper
>>> server. Here's what I know:
>>>
>>> 1. ZK connection error is the only known problem, other logs report no
>>> issue
>>>
>>> 2. Error message on YARN NodeManager on one of the slaves is:
>>>
>>> 2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)]
>> org.apache.zookeeper.ClientCnxn: Opening socket connection to server
>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error)
>>> 2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)]
>> org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected
>> error, closing socket connection and attempting reconnect
>>> java.net.ConnectException: Connection refused
>>>
>>> The connection keeps failing until it hits the 10min limit and the
>>> task fails.
>>>
>>> 3. ZooKeeper Server is deployed only on master
>>>
>>> 4. Cluster managed by CloudEra Manager 5.12.
>>>
>>> Could a configuration on Nutch side or CloudEra Manager side be
>>> missing? There are no ZK servers on the slaves and the NodeManager
>>> should be connecting to the ZK server on the master, instead of
>>> localhost:2181.
>>>
>>> Any suggestion or help is greatly appreciated!
>>>
>>> Thank you,
>>>
>>> Michael
>>>
>>
>>