Zookeeper -java.net.SocketException: Socket closed

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Zookeeper -java.net.SocketException: Socket closed

upendar devu
we are getting below error twice in a month , though its auto resolved but
anyone can explain why this error occurring and what needs to be done to
prevent the error , is this common error and can be ignored?

Please suggest.


2018-01-16 20:36:17,378 [myid:2] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for id
3, my id = 2, error = java.net.SocketException: Socket closed at
java.net.SocketInputStream.socketRead0(Native Method) at
java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
java.net.SocketInputStream.read(SocketInputStream.java:171) at
java.net.SocketInputStream.read(SocketInputStream.java:141) at
java.net.SocketInputStream.read(SocketInputStream.java:224) at
java.io.DataInputStream.readInt(DataInputStream.java:387) at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

Andor Molnar
Hi Upendar,

Thanks for reporting the issue.
I've a gut feeling which existing bug you've run into, but would you please
share some more detail (version of ZK, log context, config files, etc.) to
get confidence?

Thanks,
Andor


On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <[hidden email]>
wrote:

> we are getting below error twice in a month , though its auto resolved but
> anyone can explain why this error occurring and what needs to be done to
> prevent the error , is this common error and can be ignored?
>
> Please suggest.
>
>
> 2018-01-16 20:36:17,378 [myid:2] - WARN
> [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for id
> 3, my id = 2, error = java.net.SocketException: Socket closed at
> java.net.SocketInputStream.socketRead0(Native Method) at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
> java.net.SocketInputStream.read(SocketInputStream.java:171) at
> java.net.SocketInputStream.read(SocketInputStream.java:141) at
> java.net.SocketInputStream.read(SocketInputStream.java:224) at
> java.io.DataInputStream.readInt(DataInputStream.java:387) at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> QuorumCnxManager.java:765)
>
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

upendar devu
Thanks Andor for the reply.

We are using zookeeper version 3.4.6; we have 3 instances ; please see below configuration , I believe we are using default configuration and attached zk log  and issue is occurred at First Occurrence: 01/23/2018 07:42:22   Last Occurrence: 01/23/2018 07:43:22  


The issue occurs 3 to 4 times in a month and get auto resolved in few mins but this is really annoying our operations team. please let me know if you need any additional details 



# The number of milliseconds of each tick
tickTime=2000

# The number of ticks that the initial synchronization phase can take
initLimit=10

# The number of ticks that can pass between sending a request and getting an acknowledgement
syncLimit=5

# The directory where the snapshot is stored.
dataDir=/opt/zookeeper/current/data

# The port at which the clients will connect
clientPort=2181

# This is the list of Zookeeper peers:
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

# The interface IP address(es) from which zookeeper will listen from
clientPortAddress=<IP of zk>

# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3

# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1


On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <[hidden email]> wrote:
Hi Upendar,

Thanks for reporting the issue.
I've a gut feeling which existing bug you've run into, but would you please
share some more detail (version of ZK, log context, config files, etc.) to
get confidence?

Thanks,
Andor


On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <[hidden email]>
wrote:

> we are getting below error twice in a month , though its auto resolved but
> anyone can explain why this error occurring and what needs to be done to
> prevent the error , is this common error and can be ignored?
>
> Please suggest.
>
>
> 2018-01-16 20:36:17,378 [myid:2] - WARN
> [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for id
> 3, my id = 2, error = java.net.SocketException: Socket closed at
> java.net.SocketInputStream.socketRead0(Native Method) at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
> java.net.SocketInputStream.read(SocketInputStream.java:171) at
> java.net.SocketInputStream.read(SocketInputStream.java:141) at
> java.net.SocketInputStream.read(SocketInputStream.java:224) at
> java.io.DataInputStream.readInt(DataInputStream.java:387) at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> QuorumCnxManager.java:765)
>


Zk_log.txt (239K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

Andor Molnar
No, this is not the bug I was thinking of.

Looks like your network connection is poor between the leader and the
follower which the logs was attached. Do you have any other network
monitoring tools in place or do you see any network related error messages
in your kernel logs?
Follower lost the connection to the leader:
2018-01-23 07:40:21,709 [myid:3] - WARN
[SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader,
exception during packet send

...and took ages to recover: 944 secs!!
2018-01-23 07:56:05,742 [myid:3] - INFO
[QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
ELECTION TOOK - 944020

Additionally, a disk write has taken too long as well:
2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334] -
fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
adversely effect operation latency. See the ZooKeeper troubleshooting guide

I believe this stuff is worth to take a closer look, though I'm not an
expert of Zookeeper, maybe somebody else can give you more insight.

Regards,
Andor


On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <[hidden email]>
wrote:

> Thanks Andor for the reply.
>
> We are using zookeeper version 3.4.6; we have 3 instances ; please see
> below configuration , I believe we are using default configuration and
> attached zk log  and issue is occurred at First Occurrence: 01/23/2018
> 07:42:22   Last Occurrence: 01/23/2018 07:43:22
>
>
> The issue occurs 3 to 4 times in a month and get auto resolved in few mins
> but this is really annoying our operations team. please let me know if you
> need any additional details
>
>
>
> # The number of milliseconds of each tick
> tickTime=2000
>
> # The number of ticks that the initial synchronization phase can take
> initLimit=10
>
> # The number of ticks that can pass between sending a request and getting
> an acknowledgement
> syncLimit=5
>
> # The directory where the snapshot is stored.
> dataDir=/opt/zookeeper/current/data
>
> # The port at which the clients will connect
> clientPort=2181
>
> # This is the list of Zookeeper peers:
> server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888
> server.3=zookeeper3:2888:3888
>
> # The interface IP address(es) from which zookeeper will listen from
> clientPortAddress=<IP of zk>
>
> # The number of snapshots to retain in dataDir
> autopurge.snapRetainCount=3
>
> # Purge task interval in hours
> # Set to "0" to disable auto purge feature
> autopurge.purgeInterval=1
>
>
> On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <[hidden email]> wrote:
>
>> Hi Upendar,
>>
>> Thanks for reporting the issue.
>> I've a gut feeling which existing bug you've run into, but would you
>> please
>> share some more detail (version of ZK, log context, config files, etc.) to
>> get confidence?
>>
>> Thanks,
>> Andor
>>
>>
>> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <[hidden email]>
>> wrote:
>>
>> > we are getting below error twice in a month , though its auto resolved
>> but
>> > anyone can explain why this error occurring and what needs to be done to
>> > prevent the error , is this common error and can be ignored?
>> >
>> > Please suggest.
>> >
>> >
>> > 2018-01-16 20:36:17,378 [myid:2] - WARN
>> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken for
>> id
>> > 3, my id = 2, error = java.net.SocketException: Socket closed at
>> > java.net.SocketInputStream.socketRead0(Native Method) at
>> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
>> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
>> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
>> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
>> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
>> > QuorumCnxManager.java:765)
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

upendar devu
Thanks for sharing analysis , the instances running on EC2 instances and we
have kafka,zk,storm and es instances as well but not seen such error in
those components if there is network latency  then there should be socket
error in other components as data is being processed every sec.

Lets hear from zookeeper dev team , hope they will respond

On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <[hidden email]> wrote:

> No, this is not the bug I was thinking of.
>
> Looks like your network connection is poor between the leader and the
> follower which the logs was attached. Do you have any other network
> monitoring tools in place or do you see any network related error messages
> in your kernel logs?
> Follower lost the connection to the leader:
> 2018-01-23 07:40:21,709 [myid:3] - WARN
> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader,
> exception during packet send
>
> ...and took ages to recover: 944 secs!!
> 2018-01-23 07:56:05,742 [myid:3] - INFO
> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
> ELECTION TOOK - 944020
>
> Additionally, a disk write has taken too long as well:
> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334] -
> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
>
> I believe this stuff is worth to take a closer look, though I'm not an
> expert of Zookeeper, maybe somebody else can give you more insight.
>
> Regards,
> Andor
>
>
> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <[hidden email]>
> wrote:
>
> > Thanks Andor for the reply.
> >
> > We are using zookeeper version 3.4.6; we have 3 instances ; please see
> > below configuration , I believe we are using default configuration and
> > attached zk log  and issue is occurred at First Occurrence: 01/23/2018
> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
> >
> >
> > The issue occurs 3 to 4 times in a month and get auto resolved in few
> mins
> > but this is really annoying our operations team. please let me know if
> you
> > need any additional details
> >
> >
> >
> > # The number of milliseconds of each tick
> > tickTime=2000
> >
> > # The number of ticks that the initial synchronization phase can take
> > initLimit=10
> >
> > # The number of ticks that can pass between sending a request and getting
> > an acknowledgement
> > syncLimit=5
> >
> > # The directory where the snapshot is stored.
> > dataDir=/opt/zookeeper/current/data
> >
> > # The port at which the clients will connect
> > clientPort=2181
> >
> > # This is the list of Zookeeper peers:
> > server.1=zookeeper1:2888:3888
> > server.2=zookeeper2:2888:3888
> > server.3=zookeeper3:2888:3888
> >
> > # The interface IP address(es) from which zookeeper will listen from
> > clientPortAddress=<IP of zk>
> >
> > # The number of snapshots to retain in dataDir
> > autopurge.snapRetainCount=3
> >
> > # Purge task interval in hours
> > # Set to "0" to disable auto purge feature
> > autopurge.purgeInterval=1
> >
> >
> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <[hidden email]>
> wrote:
> >
> >> Hi Upendar,
> >>
> >> Thanks for reporting the issue.
> >> I've a gut feeling which existing bug you've run into, but would you
> >> please
> >> share some more detail (version of ZK, log context, config files, etc.)
> to
> >> get confidence?
> >>
> >> Thanks,
> >> Andor
> >>
> >>
> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <[hidden email]>
> >> wrote:
> >>
> >> > we are getting below error twice in a month , though its auto resolved
> >> but
> >> > anyone can explain why this error occurring and what needs to be done
> to
> >> > prevent the error , is this common error and can be ignored?
> >> >
> >> > Please suggest.
> >> >
> >> >
> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken
> for
> >> id
> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at
> >> > java.net.SocketInputStream.socketRead0(Native Method) at
> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
> >> > QuorumCnxManager.java:765)
> >> >
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

upendar devu
a disk write has taken too long as well:  I will check on this, thanks for
finding it.  zk logs really bit diff to understand for me.

On Thu, Jan 25, 2018 at 10:19 AM, upendar devu <[hidden email]>
wrote:

> Thanks for sharing analysis , the instances running on EC2 instances and
> we have kafka,zk,storm and es instances as well but not seen such error in
> those components if there is network latency  then there should be socket
> error in other components as data is being processed every sec.
>
> Lets hear from zookeeper dev team , hope they will respond
>
> On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <[hidden email]> wrote:
>
>> No, this is not the bug I was thinking of.
>>
>> Looks like your network connection is poor between the leader and the
>> follower which the logs was attached. Do you have any other network
>> monitoring tools in place or do you see any network related error messages
>> in your kernel logs?
>> Follower lost the connection to the leader:
>> 2018-01-23 07:40:21,709 [myid:3] - WARN
>> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to leader,
>> exception during packet send
>>
>> ...and took ages to recover: 944 secs!!
>> 2018-01-23 07:56:05,742 [myid:3] - INFO
>> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
>> ELECTION TOOK - 944020
>>
>> Additionally, a disk write has taken too long as well:
>> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334] -
>> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
>> adversely effect operation latency. See the ZooKeeper troubleshooting
>> guide
>>
>> I believe this stuff is worth to take a closer look, though I'm not an
>> expert of Zookeeper, maybe somebody else can give you more insight.
>>
>> Regards,
>> Andor
>>
>>
>> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <[hidden email]>
>> wrote:
>>
>> > Thanks Andor for the reply.
>> >
>> > We are using zookeeper version 3.4.6; we have 3 instances ; please see
>> > below configuration , I believe we are using default configuration and
>> > attached zk log  and issue is occurred at First Occurrence: 01/23/2018
>> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
>> >
>> >
>> > The issue occurs 3 to 4 times in a month and get auto resolved in few
>> mins
>> > but this is really annoying our operations team. please let me know if
>> you
>> > need any additional details
>> >
>> >
>> >
>> > # The number of milliseconds of each tick
>> > tickTime=2000
>> >
>> > # The number of ticks that the initial synchronization phase can take
>> > initLimit=10
>> >
>> > # The number of ticks that can pass between sending a request and
>> getting
>> > an acknowledgement
>> > syncLimit=5
>> >
>> > # The directory where the snapshot is stored.
>> > dataDir=/opt/zookeeper/current/data
>> >
>> > # The port at which the clients will connect
>> > clientPort=2181
>> >
>> > # This is the list of Zookeeper peers:
>> > server.1=zookeeper1:2888:3888
>> > server.2=zookeeper2:2888:3888
>> > server.3=zookeeper3:2888:3888
>> >
>> > # The interface IP address(es) from which zookeeper will listen from
>> > clientPortAddress=<IP of zk>
>> >
>> > # The number of snapshots to retain in dataDir
>> > autopurge.snapRetainCount=3
>> >
>> > # Purge task interval in hours
>> > # Set to "0" to disable auto purge feature
>> > autopurge.purgeInterval=1
>> >
>> >
>> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <[hidden email]>
>> wrote:
>> >
>> >> Hi Upendar,
>> >>
>> >> Thanks for reporting the issue.
>> >> I've a gut feeling which existing bug you've run into, but would you
>> >> please
>> >> share some more detail (version of ZK, log context, config files,
>> etc.) to
>> >> get confidence?
>> >>
>> >> Thanks,
>> >> Andor
>> >>
>> >>
>> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <[hidden email]>
>> >> wrote:
>> >>
>> >> > we are getting below error twice in a month , though its auto
>> resolved
>> >> but
>> >> > anyone can explain why this error occurring and what needs to be
>> done to
>> >> > prevent the error , is this common error and can be ignored?
>> >> >
>> >> > Please suggest.
>> >> >
>> >> >
>> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
>> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken
>> for
>> >> id
>> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at
>> >> > java.net.SocketInputStream.socketRead0(Native Method) at
>> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at
>> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
>> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
>> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
>> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
>> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(
>> >> > QuorumCnxManager.java:765)
>> >> >
>> >>
>> >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

Andor Molnar
Use EBS drives and make sure you allocate enough IOPS for the load.

Andor


On Thu, Jan 25, 2018 at 4:21 PM, upendar devu <[hidden email]>
wrote:

> a disk write has taken too long as well:  I will check on this, thanks for
> finding it.  zk logs really bit diff to understand for me.
>
> On Thu, Jan 25, 2018 at 10:19 AM, upendar devu <[hidden email]>
> wrote:
>
> > Thanks for sharing analysis , the instances running on EC2 instances and
> > we have kafka,zk,storm and es instances as well but not seen such error
> in
> > those components if there is network latency  then there should be socket
> > error in other components as data is being processed every sec.
> >
> > Lets hear from zookeeper dev team , hope they will respond
> >
> > On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <[hidden email]>
> wrote:
> >
> >> No, this is not the bug I was thinking of.
> >>
> >> Looks like your network connection is poor between the leader and the
> >> follower which the logs was attached. Do you have any other network
> >> monitoring tools in place or do you see any network related error
> messages
> >> in your kernel logs?
> >> Follower lost the connection to the leader:
> >> 2018-01-23 07:40:21,709 [myid:3] - WARN
> >> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to
> leader,
> >> exception during packet send
> >>
> >> ...and took ages to recover: 944 secs!!
> >> 2018-01-23 07:56:05,742 [myid:3] - INFO
> >> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
> >> ELECTION TOOK - 944020
> >>
> >> Additionally, a disk write has taken too long as well:
> >> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334]
> -
> >> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
> >> adversely effect operation latency. See the ZooKeeper troubleshooting
> >> guide
> >>
> >> I believe this stuff is worth to take a closer look, though I'm not an
> >> expert of Zookeeper, maybe somebody else can give you more insight.
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <[hidden email]>
> >> wrote:
> >>
> >> > Thanks Andor for the reply.
> >> >
> >> > We are using zookeeper version 3.4.6; we have 3 instances ; please see
> >> > below configuration , I believe we are using default configuration and
> >> > attached zk log  and issue is occurred at First Occurrence: 01/23/2018
> >> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
> >> >
> >> >
> >> > The issue occurs 3 to 4 times in a month and get auto resolved in few
> >> mins
> >> > but this is really annoying our operations team. please let me know if
> >> you
> >> > need any additional details
> >> >
> >> >
> >> >
> >> > # The number of milliseconds of each tick
> >> > tickTime=2000
> >> >
> >> > # The number of ticks that the initial synchronization phase can take
> >> > initLimit=10
> >> >
> >> > # The number of ticks that can pass between sending a request and
> >> getting
> >> > an acknowledgement
> >> > syncLimit=5
> >> >
> >> > # The directory where the snapshot is stored.
> >> > dataDir=/opt/zookeeper/current/data
> >> >
> >> > # The port at which the clients will connect
> >> > clientPort=2181
> >> >
> >> > # This is the list of Zookeeper peers:
> >> > server.1=zookeeper1:2888:3888
> >> > server.2=zookeeper2:2888:3888
> >> > server.3=zookeeper3:2888:3888
> >> >
> >> > # The interface IP address(es) from which zookeeper will listen from
> >> > clientPortAddress=<IP of zk>
> >> >
> >> > # The number of snapshots to retain in dataDir
> >> > autopurge.snapRetainCount=3
> >> >
> >> > # Purge task interval in hours
> >> > # Set to "0" to disable auto purge feature
> >> > autopurge.purgeInterval=1
> >> >
> >> >
> >> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <[hidden email]>
> >> wrote:
> >> >
> >> >> Hi Upendar,
> >> >>
> >> >> Thanks for reporting the issue.
> >> >> I've a gut feeling which existing bug you've run into, but would you
> >> >> please
> >> >> share some more detail (version of ZK, log context, config files,
> >> etc.) to
> >> >> get confidence?
> >> >>
> >> >> Thanks,
> >> >> Andor
> >> >>
> >> >>
> >> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <
> [hidden email]>
> >> >> wrote:
> >> >>
> >> >> > we are getting below error twice in a month , though its auto
> >> resolved
> >> >> but
> >> >> > anyone can explain why this error occurring and what needs to be
> >> done to
> >> >> > prevent the error , is this common error and can be ignored?
> >> >> >
> >> >> > Please suggest.
> >> >> >
> >> >> >
> >> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
> >> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection broken
> >> for
> >> >> id
> >> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at
> >> >> > java.net.SocketInputStream.socketRead0(Native Method) at
> >> >> > java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at
> >> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
> >> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
> >> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
> >> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
> >> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$
> RecvWorker.run(
> >> >> > QuorumCnxManager.java:765)
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper -java.net.SocketException: Socket closed

upendar devu
Thank you, will check

On Thu, Jan 25, 2018 at 11:10 AM, Andor Molnar <[hidden email]> wrote:

> Use EBS drives and make sure you allocate enough IOPS for the load.
>
> Andor
>
>
> On Thu, Jan 25, 2018 at 4:21 PM, upendar devu <[hidden email]>
> wrote:
>
> > a disk write has taken too long as well:  I will check on this, thanks
> for
> > finding it.  zk logs really bit diff to understand for me.
> >
> > On Thu, Jan 25, 2018 at 10:19 AM, upendar devu <[hidden email]>
> > wrote:
> >
> > > Thanks for sharing analysis , the instances running on EC2 instances
> and
> > > we have kafka,zk,storm and es instances as well but not seen such error
> > in
> > > those components if there is network latency  then there should be
> socket
> > > error in other components as data is being processed every sec.
> > >
> > > Lets hear from zookeeper dev team , hope they will respond
> > >
> > > On Thu, Jan 25, 2018 at 6:39 AM, Andor Molnar <[hidden email]>
> > wrote:
> > >
> > >> No, this is not the bug I was thinking of.
> > >>
> > >> Looks like your network connection is poor between the leader and the
> > >> follower which the logs was attached. Do you have any other network
> > >> monitoring tools in place or do you see any network related error
> > messages
> > >> in your kernel logs?
> > >> Follower lost the connection to the leader:
> > >> 2018-01-23 07:40:21,709 [myid:3] - WARN
> > >> [SyncThread:3:SendAckRequestProcessor@64] - Closing connection to
> > leader,
> > >> exception during packet send
> > >>
> > >> ...and took ages to recover: 944 secs!!
> > >> 2018-01-23 07:56:05,742 [myid:3] - INFO
> > >> [QuorumPeer[myid=3]/XX.XX.XX:2181:Follower@63] - FOLLOWING - LEADER
> > >> ELECTION TOOK - 944020
> > >>
> > >> Additionally, a disk write has taken too long as well:
> > >> 2018-01-23 07:40:21,706 [myid:3] - WARN  [SyncThread:3:FileTxnLog@334
> ]
> > -
> > >> fsync-ing the write ahead log in SyncThread:3 took 13638ms which will
> > >> adversely effect operation latency. See the ZooKeeper troubleshooting
> > >> guide
> > >>
> > >> I believe this stuff is worth to take a closer look, though I'm not an
> > >> expert of Zookeeper, maybe somebody else can give you more insight.
> > >>
> > >> Regards,
> > >> Andor
> > >>
> > >>
> > >> On Wed, Jan 24, 2018 at 7:47 PM, upendar devu <[hidden email]
> >
> > >> wrote:
> > >>
> > >> > Thanks Andor for the reply.
> > >> >
> > >> > We are using zookeeper version 3.4.6; we have 3 instances ; please
> see
> > >> > below configuration , I believe we are using default configuration
> and
> > >> > attached zk log  and issue is occurred at First Occurrence:
> 01/23/2018
> > >> > 07:42:22   Last Occurrence: 01/23/2018 07:43:22
> > >> >
> > >> >
> > >> > The issue occurs 3 to 4 times in a month and get auto resolved in
> few
> > >> mins
> > >> > but this is really annoying our operations team. please let me know
> if
> > >> you
> > >> > need any additional details
> > >> >
> > >> >
> > >> >
> > >> > # The number of milliseconds of each tick
> > >> > tickTime=2000
> > >> >
> > >> > # The number of ticks that the initial synchronization phase can
> take
> > >> > initLimit=10
> > >> >
> > >> > # The number of ticks that can pass between sending a request and
> > >> getting
> > >> > an acknowledgement
> > >> > syncLimit=5
> > >> >
> > >> > # The directory where the snapshot is stored.
> > >> > dataDir=/opt/zookeeper/current/data
> > >> >
> > >> > # The port at which the clients will connect
> > >> > clientPort=2181
> > >> >
> > >> > # This is the list of Zookeeper peers:
> > >> > server.1=zookeeper1:2888:3888
> > >> > server.2=zookeeper2:2888:3888
> > >> > server.3=zookeeper3:2888:3888
> > >> >
> > >> > # The interface IP address(es) from which zookeeper will listen from
> > >> > clientPortAddress=<IP of zk>
> > >> >
> > >> > # The number of snapshots to retain in dataDir
> > >> > autopurge.snapRetainCount=3
> > >> >
> > >> > # Purge task interval in hours
> > >> > # Set to "0" to disable auto purge feature
> > >> > autopurge.purgeInterval=1
> > >> >
> > >> >
> > >> > On Wed, Jan 24, 2018 at 4:51 AM, Andor Molnar <[hidden email]>
> > >> wrote:
> > >> >
> > >> >> Hi Upendar,
> > >> >>
> > >> >> Thanks for reporting the issue.
> > >> >> I've a gut feeling which existing bug you've run into, but would
> you
> > >> >> please
> > >> >> share some more detail (version of ZK, log context, config files,
> > >> etc.) to
> > >> >> get confidence?
> > >> >>
> > >> >> Thanks,
> > >> >> Andor
> > >> >>
> > >> >>
> > >> >> On Wed, Jan 17, 2018 at 4:36 PM, upendar devu <
> > [hidden email]>
> > >> >> wrote:
> > >> >>
> > >> >> > we are getting below error twice in a month , though its auto
> > >> resolved
> > >> >> but
> > >> >> > anyone can explain why this error occurring and what needs to be
> > >> done to
> > >> >> > prevent the error , is this common error and can be ignored?
> > >> >> >
> > >> >> > Please suggest.
> > >> >> >
> > >> >> >
> > >> >> > 2018-01-16 20:36:17,378 [myid:2] - WARN
> > >> >> > [RecvWorker:3:QuorumCnxManager$RecvWorker@780] - Connection
> broken
> > >> for
> > >> >> id
> > >> >> > 3, my id = 2, error = java.net.SocketException: Socket closed at
> > >> >> > java.net.SocketInputStream.socketRead0(Native Method) at
> > >> >> > java.net.SocketInputStream.socketRead(SocketInputStream.
> java:116)
> > at
> > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:171) at
> > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:141) at
> > >> >> > java.net.SocketInputStream.read(SocketInputStream.java:224) at
> > >> >> > java.io.DataInputStream.readInt(DataInputStream.java:387) at
> > >> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$
> > RecvWorker.run(
> > >> >> > QuorumCnxManager.java:765)
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>