Sessions Expire due to Network partitioning in Zookeeper

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Sessions Expire due to Network partitioning in Zookeeper

Tharindu Kumara
​Recently, carried out a test to to find the behavior of clients when a
client is partitioned from the ensemble.

Here I used a ensemble of 3 zookeeper servers called A, B and C. And quorum
was set up like below.

A - Follower
B - Leader
C - Follower​

A  <---> B <---> C
   \____________/

And 3 clients are connected to ensemble like below.

C1 is connected A
C2 is connected B
C3 is connected C.

I used iptables to remove the network link between B and C.

command used: iptables -I INPUT -s 123.123.45.123 -j DROP

After removing the link connections looks like below.

A  <----> B         C
   \____________/

Simply there is no way to communicate from B to C and vice versa.

Here What I noticed is that the client connected to Zookeeper Server "C",
could not connect to the ensemble resulting a session expiration timeout.

For this experiment I used tickTime of 3000ms and client session expiration
timeout of 45000ms. And tested with different combinations also.

Can someone please explain what is the root cause for this behavior?
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Rakesh Radhakrishnan-2
Hi,

Could you please give few more details,

1) Could you tell me the status of Server C, is this lost connection to the
quorum and fails to join quorum continuously as B is the Leader ?

2) C3 is connected C. Please tell me the connection host string passed to
this client. Does it contains all three servers info "A:clientport,
B:clientport, C:clientport" ?

3) Please check all three servers and client C3 logs to see any
inconsistencies or exceptions.

4) ZooKeeper version used in your testing ?


Rakesh

On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <[hidden email]>
wrote:

> ​Recently, carried out a test to to find the behavior of clients when a
> client is partitioned from the ensemble.
>
> Here I used a ensemble of 3 zookeeper servers called A, B and C. And quorum
> was set up like below.
>
> A - Follower
> B - Leader
> C - Follower​
>
> A  <---> B <---> C
>    \____________/
>
> And 3 clients are connected to ensemble like below.
>
> C1 is connected A
> C2 is connected B
> C3 is connected C.
>
> I used iptables to remove the network link between B and C.
>
> command used: iptables -I INPUT -s 123.123.45.123 -j DROP
>
> After removing the link connections looks like below.
>
> A  <----> B         C
>    \____________/
>
> Simply there is no way to communicate from B to C and vice versa.
>
> Here What I noticed is that the client connected to Zookeeper Server "C",
> could not connect to the ensemble resulting a session expiration timeout.
>
> For this experiment I used tickTime of 3000ms and client session expiration
> timeout of 45000ms. And tested with different combinations also.
>
> Can someone please explain what is the root cause for this behavior?
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Tharindu Kumara
 > ​
1) Could you tell me the status of Server C, is this lost connection to the
 >     quorum and fails to join quorum continuously as B is the Leader ?

Yes, B the leader. C Server is completely isolated from the Leader(B)
and It cannot communicate with the Leader. C cannot continuously connect to
the

Leader.


 > 2) C3 is connected C. Please tell me the connection host string passed to
 >     this client. Does it contains all three servers info "A:clientport,
 >    B:clientport, C:clientport" ?

Yes, C3's connection string contains all three servers. ("A:clientport,
B:clientport, C:clientport")


 > 3) Please check all three servers and client C3 logs to see any
 >    inconsistencies or exceptions.

After looking at logs, it seems when the server C isolated from the Leader,

a disconnect event fires to client C3. Then it (C3) tries too much time to
connect to Server B(Leader) .

But it cannot connect to server B, as we blocked the connection between
Server C and

Server B. Basically, C3 tries more than half of the session timeout time to
connect to Server B.

Then after figuring out that C3 cannot to connect to Server B, it tries to
connect

to Server A, and it connects to Server A successfully. But this is too
late, because

session is already expired at the time C3 connected.

And this happens sometimes only. Because when we specify all the servers in
the client's

connect string, sometimes after C3 disconnecting from Server C, instead of
trying to connect to

Server B it connects to Server A as the first attempt. In this case the
client C3 connects to the

quorum successfully before the session expiration.

According to my understanding, it looks like, when a client trying to
connect to a server that it cannot

connect due to a network partitioning, it uses a blocking call and it waits
too much time trying to

connect to a server that it cannot communicate.



 > 4) ZooKeeper version used in your testing ?

I used zookeeper 3.4.9 (current stable release)



On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <[hidden email]>
wrote:

> Hi,
>
> Could you please give few more details,
>
> ​​
> 1) Could you tell me the status of Server C, is this lost connection to the
> quorum and fails to join quorum continuously as B is the Leader ?
>
> 2) C3 is connected C. Please tell me the connection host string passed to
> this client. Does it contains all three servers info "A:clientport,
> B:clientport, C:clientport" ?
>
> 3) Please check all three servers and client C3 logs to see any
> inconsistencies or exceptions.
>
> 4) ZooKeeper version used in your testing ?
>
>
> Rakesh
>
> On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <[hidden email]
> >
> wrote:
>
> > ​Recently, carried out a test to to find the behavior of clients when a
> > client is partitioned from the ensemble.
> >
> > Here I used a ensemble of 3 zookeeper servers called A, B and C. And
> quorum
> > was set up like below.
> >
> > A - Follower
> > B - Leader
> > C - Follower​
> >
> > A  <---> B <---> C
> >    \____________/
> >
> > And 3 clients are connected to ensemble like below.
> >
> > C1 is connected A
> > C2 is connected B
> > C3 is connected C.
> >
> > I used iptables to remove the network link between B and C.
> >
> > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> >
> > After removing the link connections looks like below.
> >
> > A  <----> B         C
> >    \____________/
> >
> > Simply there is no way to communicate from B to C and vice versa.
> >
> > Here What I noticed is that the client connected to Zookeeper Server "C",
> > could not connect to the ensemble resulting a session expiration timeout.
> >
> > For this experiment I used tickTime of 3000ms and client session
> expiration
> > timeout of 45000ms. And tested with different combinations also.
> >
> > Can someone please explain what is the root cause for this behavior?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Rakesh Radhakrishnan-2
>>>> According to my understanding, it looks like, when a client trying to
>>>> connect to a server that it cannot connect due to a network
partitioning,
>>>> it uses a blocking call and it waits too much time trying to
>>>> connect to a server that it cannot communicate.

Actually, ZooKeeper client has retry mechanism.
Client sends a ping every 1/3 the session timeout (here, 3 is the no. of
listed servers, A, B, C)
and then looks for a response before another 1/3 elapses. This allows time
to reconnect to a
different server (and still maintain the session) if the connected server
becomes unavailable.

Could you grep the following log message in your client log and tell me how
much time C3 taken for the re-connection attempts.
"Client session timed out, have not heard from server in "

C3 might have first attempted to reconnect to B and then A. Also, need to
check how much time C3 taken to detect connection failure from server C.

Could you please share the zk client log to dig more.

Rakesh


On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <[hidden email]>
wrote:

>  > ​
> 1) Could you tell me the status of Server C, is this lost connection to the
>  >     quorum and fails to join quorum continuously as B is the Leader ?
>
> Yes, B the leader. C Server is completely isolated from the Leader(B)
> and It cannot communicate with the Leader. C cannot continuously connect to
> the
>
> Leader.
>
>
>  > 2) C3 is connected C. Please tell me the connection host string passed
> to
>  >     this client. Does it contains all three servers info "A:clientport,
>  >    B:clientport, C:clientport" ?
>
> Yes, C3's connection string contains all three servers. ("A:clientport,
> B:clientport, C:clientport")
>
>
>  > 3) Please check all three servers and client C3 logs to see any
>  >    inconsistencies or exceptions.
>
> After looking at logs, it seems when the server C isolated from the Leader,
>
> a disconnect event fires to client C3. Then it (C3) tries too much time to
> connect to Server B(Leader) .
>
> But it cannot connect to server B, as we blocked the connection between
> Server C and
>
> Server B. Basically, C3 tries more than half of the session timeout time to
> connect to Server B.
>
> Then after figuring out that C3 cannot to connect to Server B, it tries to
> connect
>
> to Server A, and it connects to Server A successfully. But this is too
> late, because
>
> session is already expired at the time C3 connected.
>
> And this happens sometimes only. Because when we specify all the servers in
> the client's
>
> connect string, sometimes after C3 disconnecting from Server C, instead of
> trying to connect to
>
> Server B it connects to Server A as the first attempt. In this case the
> client C3 connects to the
>
> quorum successfully before the session expiration.
>
> According to my understanding, it looks like, when a client trying to
> connect to a server that it cannot
>
> connect due to a network partitioning, it uses a blocking call and it waits
> too much time trying to
>
> connect to a server that it cannot communicate.
>
>
>
>  > 4) ZooKeeper version used in your testing ?
>
> I used zookeeper 3.4.9 (current stable release)
>
>
>
> On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <[hidden email]>
> wrote:
>
> > Hi,
> >
> > Could you please give few more details,
> >
> > ​​
> > 1) Could you tell me the status of Server C, is this lost connection to
> the
> > quorum and fails to join quorum continuously as B is the Leader ?
> >
> > 2) C3 is connected C. Please tell me the connection host string passed to
> > this client. Does it contains all three servers info "A:clientport,
> > B:clientport, C:clientport" ?
> >
> > 3) Please check all three servers and client C3 logs to see any
> > inconsistencies or exceptions.
> >
> > 4) ZooKeeper version used in your testing ?
> >
> >
> > Rakesh
> >
> > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> [hidden email]
> > >
> > wrote:
> >
> > > ​Recently, carried out a test to to find the behavior of clients when a
> > > client is partitioned from the ensemble.
> > >
> > > Here I used a ensemble of 3 zookeeper servers called A, B and C. And
> > quorum
> > > was set up like below.
> > >
> > > A - Follower
> > > B - Leader
> > > C - Follower​
> > >
> > > A  <---> B <---> C
> > >    \____________/
> > >
> > > And 3 clients are connected to ensemble like below.
> > >
> > > C1 is connected A
> > > C2 is connected B
> > > C3 is connected C.
> > >
> > > I used iptables to remove the network link between B and C.
> > >
> > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> > >
> > > After removing the link connections looks like below.
> > >
> > > A  <----> B         C
> > >    \____________/
> > >
> > > Simply there is no way to communicate from B to C and vice versa.
> > >
> > > Here What I noticed is that the client connected to Zookeeper Server
> "C",
> > > could not connect to the ensemble resulting a session expiration
> timeout.
> > >
> > > For this experiment I used tickTime of 3000ms and client session
> > expiration
> > > timeout of 45000ms. And tested with different combinations also.
> > >
> > > Can someone please explain what is the root cause for this behavior?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Tharindu Kumara
​​
Hi Rakesh,

First of all thank you for the quick reply.

>>>>> Actually, ZooKeeper client has retry mechanism.
>>>>> Client sends a ping every 1/3 the session timeout (here, 3 is the no.
of listed servers, A, B, C) and then looks for a response before another
1/3 elapses. This allows time to reconnect to a different server (and still
maintain the session) if the connected server becomes unavailable.

You mentioned that a client sends a ping every 1/3 the session timeout. And
3 is the no of listed servers.

I doubt that. Because, I am using the C Binding and after inspecting the
code it looks like that 3 is a hard coded value.
Simply no matter what the number of clients, zk client biding is always
sending a ping every 1/3 session timeout.

Can please clarify that for me?

Here I used a tick of 3000ms and session expiration timeout of 45000ms.

And please find the screenshot of extacted client log outout.

https://anonimag.es/image/JT9htnL

It looks like C3 has taken 14 seconds to determine the disconnected event
and another 14 seconds to that it cannot connect to Server B(C3 is isolated
from B).



On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <[hidden email]>
wrote:

> >>>> According to my understanding, it looks like, when a client trying to
> >>>> connect to a server that it cannot connect due to a network
> partitioning,
> >>>> it uses a blocking call and it waits too much time trying to
> >>>> connect to a server that it cannot communicate.
>
> Actually, ZooKeeper client has retry mechanism.
> Client sends a ping every 1/3 the session timeout (here, 3 is the no. of
> listed servers, A, B, C)
> and then looks for a response before another 1/3 elapses. This allows time
> to reconnect to a
> different server (and still maintain the session) if the connected server
> becomes unavailable.
>
> Could you grep the following log message in your client log and tell me how
> much time C3 taken for the re-connection attempts.
> "Client session timed out, have not heard from server in "
>
> C3 might have first attempted to reconnect to B and then A. Also, need to
> check how much time C3 taken to detect connection failure from server C.
>
> Could you please share the zk client log to dig more.
>
> Rakesh
>
>
> On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
> [hidden email]>
> wrote:
>
> >  > ​
> > 1) Could you tell me the status of Server C, is this lost connection to
> the
> >  >     quorum and fails to join quorum continuously as B is the Leader ?
> >
> > Yes, B the leader. C Server is completely isolated from the Leader(B)
> > and It cannot communicate with the Leader. C cannot continuously connect
> to
> > the
> >
> > Leader.
> >
> >
> >  > 2) C3 is connected C. Please tell me the connection host string passed
> > to
> >  >     this client. Does it contains all three servers info
> "A:clientport,
> >  >    B:clientport, C:clientport" ?
> >
> > Yes, C3's connection string contains all three servers. ("A:clientport,
> > B:clientport, C:clientport")
> >
> >
> >  > 3) Please check all three servers and client C3 logs to see any
> >  >    inconsistencies or exceptions.
> >
> > After looking at logs, it seems when the server C isolated from the
> Leader,
> >
> > a disconnect event fires to client C3. Then it (C3) tries too much time
> to
> > connect to Server B(Leader) .
> >
> > But it cannot connect to server B, as we blocked the connection between
> > Server C and
> >
> > Server B. Basically, C3 tries more than half of the session timeout time
> to
> > connect to Server B.
> >
> > Then after figuring out that C3 cannot to connect to Server B, it tries
> to
> > connect
> >
> > to Server A, and it connects to Server A successfully. But this is too
> > late, because
> >
> > session is already expired at the time C3 connected.
> >
> > And this happens sometimes only. Because when we specify all the servers
> in
> > the client's
> >
> > connect string, sometimes after C3 disconnecting from Server C, instead
> of
> > trying to connect to
> >
> > Server B it connects to Server A as the first attempt. In this case the
> > client C3 connects to the
> >
> > quorum successfully before the session expiration.
> >
> > According to my understanding, it looks like, when a client trying to
> > connect to a server that it cannot
> >
> > connect due to a network partitioning, it uses a blocking call and it
> waits
> > too much time trying to
> >
> > connect to a server that it cannot communicate.
> >
> >
> >
> >  > 4) ZooKeeper version used in your testing ?
> >
> > I used zookeeper 3.4.9 (current stable release)
> >
> >
> >
> > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <[hidden email]
> >
> > wrote:
> >
> > > Hi,
> > >
> > > Could you please give few more details,
> > >
> > > ​​
> > > 1) Could you tell me the status of Server C, is this lost connection to
> > the
> > > quorum and fails to join quorum continuously as B is the Leader ?
> > >
> > > 2) C3 is connected C. Please tell me the connection host string passed
> to
> > > this client. Does it contains all three servers info "A:clientport,
> > > B:clientport, C:clientport" ?
> > >
> > > 3) Please check all three servers and client C3 logs to see any
> > > inconsistencies or exceptions.
> > >
> > > 4) ZooKeeper version used in your testing ?
> > >
> > >
> > > Rakesh
> > >
> > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> > [hidden email]
> > > >
> > > wrote:
> > >
> > > > ​Recently, carried out a test to to find the behavior of clients
> when a
> > > > client is partitioned from the ensemble.
> > > >
> > > > Here I used a ensemble of 3 zookeeper servers called A, B and C. And
> > > quorum
> > > > was set up like below.
> > > >
> > > > A - Follower
> > > > B - Leader
> > > > C - Follower​
> > > >
> > > > A  <---> B <---> C
> > > >    \____________/
> > > >
> > > > And 3 clients are connected to ensemble like below.
> > > >
> > > > C1 is connected A
> > > > C2 is connected B
> > > > C3 is connected C.
> > > >
> > > > I used iptables to remove the network link between B and C.
> > > >
> > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> > > >
> > > > After removing the link connections looks like below.
> > > >
> > > > A  <----> B         C
> > > >    \____________/
> > > >
> > > > Simply there is no way to communicate from B to C and vice versa.
> > > >
> > > > Here What I noticed is that the client connected to Zookeeper Server
> > "C",
> > > > could not connect to the ensemble resulting a session expiration
> > timeout.
> > > >
> > > > For this experiment I used tickTime of 3000ms and client session
> > > expiration
> > > > timeout of 45000ms. And tested with different combinations also.
> > > >
> > > > Can someone please explain what is the root cause for this behavior?
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Rakesh Radhakrishnan-2
>>> You mentioned that a client sends a ping every 1/3 the session timeout.
Yes, you are correct. Again to analyse your issue, we have to consider
re-connection timeout also, which is "sessiontimeout/listed servers count"
https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1292
https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1098

Coincidentally, in your example both heartbeat interval and re-connection
interval are same as you have three servers.

>>>>> It looks like C3 has taken 14 seconds to determine the disconnected
event
>>>>> and another 14 seconds to that it cannot connect to Server B(C3 is
isolated
>>>>> from B).

With this info, total elapsed time is 28 secs which is less than 45 secs
session timeout. Now, the client has 17 secs (45 secs - 28 secs) time
period to re-establish a connection with server A, right? Could you please
check whether the client is connecting to A during this period?

Rakesh



On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara <[hidden email]>
wrote:

> ​​
> Hi Rakesh,
>
> First of all thank you for the quick reply.
>
> >>>>> Actually, ZooKeeper client has retry mechanism.
> >>>>> Client sends a ping every 1/3 the session timeout (here, 3 is the no.
> of listed servers, A, B, C) and then looks for a response before another
> 1/3 elapses. This allows time to reconnect to a different server (and still
> maintain the session) if the connected server becomes unavailable.
>
> You mentioned that a client sends a ping every 1/3 the session timeout. And
> 3 is the no of listed servers.
>
> I doubt that. Because, I am using the C Binding and after inspecting the
> code it looks like that 3 is a hard coded value.
> Simply no matter what the number of clients, zk client biding is always
> sending a ping every 1/3 session timeout.
>
> Can please clarify that for me?
>
> Here I used a tick of 3000ms and session expiration timeout of 45000ms.
>
> And please find the screenshot of extacted client log outout.
>
> https://anonimag.es/image/JT9htnL
>
> It looks like C3 has taken 14 seconds to determine the disconnected event
> and another 14 seconds to that it cannot connect to Server B(C3 is isolated
> from B).
>
>
>
> On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <[hidden email]>
> wrote:
>
> > >>>> According to my understanding, it looks like, when a client trying
> to
> > >>>> connect to a server that it cannot connect due to a network
> > partitioning,
> > >>>> it uses a blocking call and it waits too much time trying to
> > >>>> connect to a server that it cannot communicate.
> >
> > Actually, ZooKeeper client has retry mechanism.
> > Client sends a ping every 1/3 the session timeout (here, 3 is the no. of
> > listed servers, A, B, C)
> > and then looks for a response before another 1/3 elapses. This allows
> time
> > to reconnect to a
> > different server (and still maintain the session) if the connected server
> > becomes unavailable.
> >
> > Could you grep the following log message in your client log and tell me
> how
> > much time C3 taken for the re-connection attempts.
> > "Client session timed out, have not heard from server in "
> >
> > C3 might have first attempted to reconnect to B and then A. Also, need to
> > check how much time C3 taken to detect connection failure from server C.
> >
> > Could you please share the zk client log to dig more.
> >
> > Rakesh
> >
> >
> > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
> > [hidden email]>
> > wrote:
> >
> > >  > ​
> > > 1) Could you tell me the status of Server C, is this lost connection to
> > the
> > >  >     quorum and fails to join quorum continuously as B is the Leader
> ?
> > >
> > > Yes, B the leader. C Server is completely isolated from the Leader(B)
> > > and It cannot communicate with the Leader. C cannot continuously
> connect
> > to
> > > the
> > >
> > > Leader.
> > >
> > >
> > >  > 2) C3 is connected C. Please tell me the connection host string
> passed
> > > to
> > >  >     this client. Does it contains all three servers info
> > "A:clientport,
> > >  >    B:clientport, C:clientport" ?
> > >
> > > Yes, C3's connection string contains all three servers. ("A:clientport,
> > > B:clientport, C:clientport")
> > >
> > >
> > >  > 3) Please check all three servers and client C3 logs to see any
> > >  >    inconsistencies or exceptions.
> > >
> > > After looking at logs, it seems when the server C isolated from the
> > Leader,
> > >
> > > a disconnect event fires to client C3. Then it (C3) tries too much time
> > to
> > > connect to Server B(Leader) .
> > >
> > > But it cannot connect to server B, as we blocked the connection between
> > > Server C and
> > >
> > > Server B. Basically, C3 tries more than half of the session timeout
> time
> > to
> > > connect to Server B.
> > >
> > > Then after figuring out that C3 cannot to connect to Server B, it tries
> > to
> > > connect
> > >
> > > to Server A, and it connects to Server A successfully. But this is too
> > > late, because
> > >
> > > session is already expired at the time C3 connected.
> > >
> > > And this happens sometimes only. Because when we specify all the
> servers
> > in
> > > the client's
> > >
> > > connect string, sometimes after C3 disconnecting from Server C, instead
> > of
> > > trying to connect to
> > >
> > > Server B it connects to Server A as the first attempt. In this case the
> > > client C3 connects to the
> > >
> > > quorum successfully before the session expiration.
> > >
> > > According to my understanding, it looks like, when a client trying to
> > > connect to a server that it cannot
> > >
> > > connect due to a network partitioning, it uses a blocking call and it
> > waits
> > > too much time trying to
> > >
> > > connect to a server that it cannot communicate.
> > >
> > >
> > >
> > >  > 4) ZooKeeper version used in your testing ?
> > >
> > > I used zookeeper 3.4.9 (current stable release)
> > >
> > >
> > >
> > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <
> [hidden email]
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Could you please give few more details,
> > > >
> > > > ​​
> > > > 1) Could you tell me the status of Server C, is this lost connection
> to
> > > the
> > > > quorum and fails to join quorum continuously as B is the Leader ?
> > > >
> > > > 2) C3 is connected C. Please tell me the connection host string
> passed
> > to
> > > > this client. Does it contains all three servers info "A:clientport,
> > > > B:clientport, C:clientport" ?
> > > >
> > > > 3) Please check all three servers and client C3 logs to see any
> > > > inconsistencies or exceptions.
> > > >
> > > > 4) ZooKeeper version used in your testing ?
> > > >
> > > >
> > > > Rakesh
> > > >
> > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> > > [hidden email]
> > > > >
> > > > wrote:
> > > >
> > > > > ​Recently, carried out a test to to find the behavior of clients
> > when a
> > > > > client is partitioned from the ensemble.
> > > > >
> > > > > Here I used a ensemble of 3 zookeeper servers called A, B and C.
> And
> > > > quorum
> > > > > was set up like below.
> > > > >
> > > > > A - Follower
> > > > > B - Leader
> > > > > C - Follower​
> > > > >
> > > > > A  <---> B <---> C
> > > > >    \____________/
> > > > >
> > > > > And 3 clients are connected to ensemble like below.
> > > > >
> > > > > C1 is connected A
> > > > > C2 is connected B
> > > > > C3 is connected C.
> > > > >
> > > > > I used iptables to remove the network link between B and C.
> > > > >
> > > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> > > > >
> > > > > After removing the link connections looks like below.
> > > > >
> > > > > A  <----> B         C
> > > > >    \____________/
> > > > >
> > > > > Simply there is no way to communicate from B to C and vice versa.
> > > > >
> > > > > Here What I noticed is that the client connected to Zookeeper
> Server
> > > "C",
> > > > > could not connect to the ensemble resulting a session expiration
> > > timeout.
> > > > >
> > > > > For this experiment I used tickTime of 3000ms and client session
> > > > expiration
> > > > > timeout of 45000ms. And tested with different combinations also.
> > > > >
> > > > > Can someone please explain what is the root cause for this
> behavior?
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Tharindu Kumara
Hi Rakesh,

I have made a small mistake in the above email.
For client's session timeout I have used 20000ms not 45000ms.
Sorry for the mistake.
Basically it waits for 28 seconds to connect to Server A. (14 seconds to
figure out disconnection and another 14 seconds to connect to Server B)




Then as soon as it connected to Server A, Server A sends the session
expiration.
Then I again did the same experiment with setting client's session
expiration time to 45000ms.
Then it took 15secs to retrieve the disconnection event to client.
After from that point, client took another 30secs to trying to connect to
Server B.
And then after that, client figures out that it cannot connect to Server B
and try to initiate a connection to Server A.
Since 15 + 30 secs is 45, when as soon as client connected to Server A,
Server A sends the session expiration event.
What is the reason for that? It looks like client is trying too much time
to connect to Server B.


Here I attached the Server B's log (Leader) and client's log.
Server A = 172.25.83.201
Server B = 172.25.83.204
Server C = 172.25.83.205
And client's session id is 0x35a926acae80000

https://drive.google.com/file/d/0B9o9GJ_CoG1_NWRTQ3hmeGpZRktrc040MkZzTk5LTDdpMi1v/view?usp=sharing
https://drive.google.com/file/d/0B9o9GJ_CoG1_VHlKa092QzB6M3NPemRORk9Lc0w1a2N5ZDdz/view?usp=sharing

And also in the above post, you mentioned that, you mentioned about a
connection-timeout.
Can you please explain a bit more?

I see that when I try with both 45000ms and 20000ms, it takes around 14 -
15 seconds to figure out a disconnection from clients end.
Looks like it is a constant and client has (session timeout - connection
loss notified time) to find a new Server. Am I correct here?
You mentioned that client connection timeout is equal to
("sessiontimeout/listed servers count").
Please explain a bit about the client ping times and connection timeout.
How these to co-relate?

On Thu, Mar 2, 2017 at 8:18 PM, Rakesh Radhakrishnan <[hidden email]>
wrote:

> >>> You mentioned that a client sends a ping every 1/3 the session timeout.
> Yes, you are correct. Again to analyse your issue, we have to consider
> re-connection timeout also, which is "sessiontimeout/listed servers count"
> https://github.com/apache/zookeeper/blob/branch-3.4/src/
> java/main/org/apache/zookeeper/ClientCnxn.java#L1292
> https://github.com/apache/zookeeper/blob/branch-3.4/src/
> java/main/org/apache/zookeeper/ClientCnxn.java#L1098
>
> Coincidentally, in your example both heartbeat interval and re-connection
> interval are same as you have three servers.
>
> >>>>> It looks like C3 has taken 14 seconds to determine the disconnected
> event
> >>>>> and another 14 seconds to that it cannot connect to Server B(C3 is
> isolated
> >>>>> from B).
>
> With this info, total elapsed time is 28 secs which is less than 45 secs
> session timeout. Now, the client has 17 secs (45 secs - 28 secs) time
> period to re-establish a connection with server A, right? Could you please
> check whether the client is connecting to A during this period?
>
> Rakesh
>
>
>
> On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara <[hidden email]
> >
> wrote:
>
> > ​​
> > Hi Rakesh,
> >
> > First of all thank you for the quick reply.
> >
> > >>>>> Actually, ZooKeeper client has retry mechanism.
> > >>>>> Client sends a ping every 1/3 the session timeout (here, 3 is the
> no.
> > of listed servers, A, B, C) and then looks for a response before another
> > 1/3 elapses. This allows time to reconnect to a different server (and
> still
> > maintain the session) if the connected server becomes unavailable.
> >
> > You mentioned that a client sends a ping every 1/3 the session timeout.
> And
> > 3 is the no of listed servers.
> >
> > I doubt that. Because, I am using the C Binding and after inspecting the
> > code it looks like that 3 is a hard coded value.
> > Simply no matter what the number of clients, zk client biding is always
> > sending a ping every 1/3 session timeout.
> >
> > Can please clarify that for me?
> >
> > Here I used a tick of 3000ms and session expiration timeout of 45000ms.
> >
> > And please find the screenshot of extacted client log outout.
> >
> > https://anonimag.es/image/JT9htnL
> >
> > It looks like C3 has taken 14 seconds to determine the disconnected event
> > and another 14 seconds to that it cannot connect to Server B(C3 is
> isolated
> > from B).
> >
> >
> >
> > On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <[hidden email]
> >
> > wrote:
> >
> > > >>>> According to my understanding, it looks like, when a client trying
> > to
> > > >>>> connect to a server that it cannot connect due to a network
> > > partitioning,
> > > >>>> it uses a blocking call and it waits too much time trying to
> > > >>>> connect to a server that it cannot communicate.
> > >
> > > Actually, ZooKeeper client has retry mechanism.
> > > Client sends a ping every 1/3 the session timeout (here, 3 is the no.
> of
> > > listed servers, A, B, C)
> > > and then looks for a response before another 1/3 elapses. This allows
> > time
> > > to reconnect to a
> > > different server (and still maintain the session) if the connected
> server
> > > becomes unavailable.
> > >
> > > Could you grep the following log message in your client log and tell me
> > how
> > > much time C3 taken for the re-connection attempts.
> > > "Client session timed out, have not heard from server in "
> > >
> > > C3 might have first attempted to reconnect to B and then A. Also, need
> to
> > > check how much time C3 taken to detect connection failure from server
> C.
> > >
> > > Could you please share the zk client log to dig more.
> > >
> > > Rakesh
> > >
> > >
> > > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
> > > [hidden email]>
> > > wrote:
> > >
> > > >  > ​
> > > > 1) Could you tell me the status of Server C, is this lost connection
> to
> > > the
> > > >  >     quorum and fails to join quorum continuously as B is the
> Leader
> > ?
> > > >
> > > > Yes, B the leader. C Server is completely isolated from the Leader(B)
> > > > and It cannot communicate with the Leader. C cannot continuously
> > connect
> > > to
> > > > the
> > > >
> > > > Leader.
> > > >
> > > >
> > > >  > 2) C3 is connected C. Please tell me the connection host string
> > passed
> > > > to
> > > >  >     this client. Does it contains all three servers info
> > > "A:clientport,
> > > >  >    B:clientport, C:clientport" ?
> > > >
> > > > Yes, C3's connection string contains all three servers.
> ("A:clientport,
> > > > B:clientport, C:clientport")
> > > >
> > > >
> > > >  > 3) Please check all three servers and client C3 logs to see any
> > > >  >    inconsistencies or exceptions.
> > > >
> > > > After looking at logs, it seems when the server C isolated from the
> > > Leader,
> > > >
> > > > a disconnect event fires to client C3. Then it (C3) tries too much
> time
> > > to
> > > > connect to Server B(Leader) .
> > > >
> > > > But it cannot connect to server B, as we blocked the connection
> between
> > > > Server C and
> > > >
> > > > Server B. Basically, C3 tries more than half of the session timeout
> > time
> > > to
> > > > connect to Server B.
> > > >
> > > > Then after figuring out that C3 cannot to connect to Server B, it
> tries
> > > to
> > > > connect
> > > >
> > > > to Server A, and it connects to Server A successfully. But this is
> too
> > > > late, because
> > > >
> > > > session is already expired at the time C3 connected.
> > > >
> > > > And this happens sometimes only. Because when we specify all the
> > servers
> > > in
> > > > the client's
> > > >
> > > > connect string, sometimes after C3 disconnecting from Server C,
> instead
> > > of
> > > > trying to connect to
> > > >
> > > > Server B it connects to Server A as the first attempt. In this case
> the
> > > > client C3 connects to the
> > > >
> > > > quorum successfully before the session expiration.
> > > >
> > > > According to my understanding, it looks like, when a client trying to
> > > > connect to a server that it cannot
> > > >
> > > > connect due to a network partitioning, it uses a blocking call and it
> > > waits
> > > > too much time trying to
> > > >
> > > > connect to a server that it cannot communicate.
> > > >
> > > >
> > > >
> > > >  > 4) ZooKeeper version used in your testing ?
> > > >
> > > > I used zookeeper 3.4.9 (current stable release)
> > > >
> > > >
> > > >
> > > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Could you please give few more details,
> > > > >
> > > > > ​​
> > > > > 1) Could you tell me the status of Server C, is this lost
> connection
> > to
> > > > the
> > > > > quorum and fails to join quorum continuously as B is the Leader ?
> > > > >
> > > > > 2) C3 is connected C. Please tell me the connection host string
> > passed
> > > to
> > > > > this client. Does it contains all three servers info "A:clientport,
> > > > > B:clientport, C:clientport" ?
> > > > >
> > > > > 3) Please check all three servers and client C3 logs to see any
> > > > > inconsistencies or exceptions.
> > > > >
> > > > > 4) ZooKeeper version used in your testing ?
> > > > >
> > > > >
> > > > > Rakesh
> > > > >
> > > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> > > > [hidden email]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > ​Recently, carried out a test to to find the behavior of clients
> > > when a
> > > > > > client is partitioned from the ensemble.
> > > > > >
> > > > > > Here I used a ensemble of 3 zookeeper servers called A, B and C.
> > And
> > > > > quorum
> > > > > > was set up like below.
> > > > > >
> > > > > > A - Follower
> > > > > > B - Leader
> > > > > > C - Follower​
> > > > > >
> > > > > > A  <---> B <---> C
> > > > > >    \____________/
> > > > > >
> > > > > > And 3 clients are connected to ensemble like below.
> > > > > >
> > > > > > C1 is connected A
> > > > > > C2 is connected B
> > > > > > C3 is connected C.
> > > > > >
> > > > > > I used iptables to remove the network link between B and C.
> > > > > >
> > > > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> > > > > >
> > > > > > After removing the link connections looks like below.
> > > > > >
> > > > > > A  <----> B         C
> > > > > >    \____________/
> > > > > >
> > > > > > Simply there is no way to communicate from B to C and vice versa.
> > > > > >
> > > > > > Here What I noticed is that the client connected to Zookeeper
> > Server
> > > > "C",
> > > > > > could not connect to the ensemble resulting a session expiration
> > > > timeout.
> > > > > >
> > > > > > For this experiment I used tickTime of 3000ms and client session
> > > > > expiration
> > > > > > timeout of 45000ms. And tested with different combinations also.
> > > > > >
> > > > > > Can someone please explain what is the root cause for this
> > behavior?
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Tharindu Kumara
Hi Rakesh,

Any updates on this?

Is this a bug or an expected situation?

Thanks



On Fri, Mar 3, 2017 at 10:49 AM, Tharindu Kumara <[hidden email]>
wrote:

> Hi Rakesh,
>
> I have made a small mistake in the above email.
> For client's session timeout I have used 20000ms not 45000ms.
> Sorry for the mistake.
> Basically it waits for 28 seconds to connect to Server A. (14 seconds to
> figure out disconnection and another 14 seconds to connect to Server B)
>
>
>
>
> Then as soon as it connected to Server A, Server A sends the session
> expiration.
> Then I again did the same experiment with setting client's session
> expiration time to 45000ms.
> Then it took 15secs to retrieve the disconnection event to client.
> After from that point, client took another 30secs to trying to connect to
> Server B.
> And then after that, client figures out that it cannot connect to Server B
> and try to initiate a connection to Server A.
> Since 15 + 30 secs is 45, when as soon as client connected to Server A,
> Server A sends the session expiration event.
> What is the reason for that? It looks like client is trying too much time
> to connect to Server B.
>
>
> Here I attached the Server B's log (Leader) and client's log.
> Server A = 172.25.83.201
> Server B = 172.25.83.204
> Server C = 172.25.83.205
> And client's session id is 0x35a926acae80000
>
> https://drive.google.com/file/d/0B9o9GJ_CoG1_
> NWRTQ3hmeGpZRktrc040MkZzTk5LTDdpMi1v/view?usp=sharing
> https://drive.google.com/file/d/0B9o9GJ_CoG1_
> VHlKa092QzB6M3NPemRORk9Lc0w1a2N5ZDdz/view?usp=sharing
>
> And also in the above post, you mentioned that, you mentioned about a
> connection-timeout.
> Can you please explain a bit more?
>
> I see that when I try with both 45000ms and 20000ms, it takes around 14 -
> 15 seconds to figure out a disconnection from clients end.
> Looks like it is a constant and client has (session timeout - connection
> loss notified time) to find a new Server. Am I correct here?
> You mentioned that client connection timeout is equal to
> ("sessiontimeout/listed servers count").
> Please explain a bit about the client ping times and connection timeout.
> How these to co-relate?
>
> On Thu, Mar 2, 2017 at 8:18 PM, Rakesh Radhakrishnan <[hidden email]>
> wrote:
>
>> >>> You mentioned that a client sends a ping every 1/3 the session
>> timeout.
>> Yes, you are correct. Again to analyse your issue, we have to consider
>> re-connection timeout also, which is "sessiontimeout/listed servers count"
>> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
>> /main/org/apache/zookeeper/ClientCnxn.java#L1292
>> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
>> /main/org/apache/zookeeper/ClientCnxn.java#L1098
>>
>> Coincidentally, in your example both heartbeat interval and re-connection
>> interval are same as you have three servers.
>>
>> >>>>> It looks like C3 has taken 14 seconds to determine the disconnected
>> event
>> >>>>> and another 14 seconds to that it cannot connect to Server B(C3 is
>> isolated
>> >>>>> from B).
>>
>> With this info, total elapsed time is 28 secs which is less than 45 secs
>> session timeout. Now, the client has 17 secs (45 secs - 28 secs) time
>> period to re-establish a connection with server A, right? Could you please
>> check whether the client is connecting to A during this period?
>>
>> Rakesh
>>
>>
>>
>> On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara <
>> [hidden email]>
>> wrote:
>>
>> > ​​
>> > Hi Rakesh,
>> >
>> > First of all thank you for the quick reply.
>> >
>> > >>>>> Actually, ZooKeeper client has retry mechanism.
>> > >>>>> Client sends a ping every 1/3 the session timeout (here, 3 is the
>> no.
>> > of listed servers, A, B, C) and then looks for a response before another
>> > 1/3 elapses. This allows time to reconnect to a different server (and
>> still
>> > maintain the session) if the connected server becomes unavailable.
>> >
>> > You mentioned that a client sends a ping every 1/3 the session timeout.
>> And
>> > 3 is the no of listed servers.
>> >
>> > I doubt that. Because, I am using the C Binding and after inspecting the
>> > code it looks like that 3 is a hard coded value.
>> > Simply no matter what the number of clients, zk client biding is always
>> > sending a ping every 1/3 session timeout.
>> >
>> > Can please clarify that for me?
>> >
>> > Here I used a tick of 3000ms and session expiration timeout of 45000ms.
>> >
>> > And please find the screenshot of extacted client log outout.
>> >
>> > https://anonimag.es/image/JT9htnL
>> >
>> > It looks like C3 has taken 14 seconds to determine the disconnected
>> event
>> > and another 14 seconds to that it cannot connect to Server B(C3 is
>> isolated
>> > from B).
>> >
>> >
>> >
>> > On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <
>> [hidden email]>
>> > wrote:
>> >
>> > > >>>> According to my understanding, it looks like, when a client
>> trying
>> > to
>> > > >>>> connect to a server that it cannot connect due to a network
>> > > partitioning,
>> > > >>>> it uses a blocking call and it waits too much time trying to
>> > > >>>> connect to a server that it cannot communicate.
>> > >
>> > > Actually, ZooKeeper client has retry mechanism.
>> > > Client sends a ping every 1/3 the session timeout (here, 3 is the no.
>> of
>> > > listed servers, A, B, C)
>> > > and then looks for a response before another 1/3 elapses. This allows
>> > time
>> > > to reconnect to a
>> > > different server (and still maintain the session) if the connected
>> server
>> > > becomes unavailable.
>> > >
>> > > Could you grep the following log message in your client log and tell
>> me
>> > how
>> > > much time C3 taken for the re-connection attempts.
>> > > "Client session timed out, have not heard from server in "
>> > >
>> > > C3 might have first attempted to reconnect to B and then A. Also,
>> need to
>> > > check how much time C3 taken to detect connection failure from server
>> C.
>> > >
>> > > Could you please share the zk client log to dig more.
>> > >
>> > > Rakesh
>> > >
>> > >
>> > > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
>> > > [hidden email]>
>> > > wrote:
>> > >
>> > > >  > ​
>> > > > 1) Could you tell me the status of Server C, is this lost
>> connection to
>> > > the
>> > > >  >     quorum and fails to join quorum continuously as B is the
>> Leader
>> > ?
>> > > >
>> > > > Yes, B the leader. C Server is completely isolated from the
>> Leader(B)
>> > > > and It cannot communicate with the Leader. C cannot continuously
>> > connect
>> > > to
>> > > > the
>> > > >
>> > > > Leader.
>> > > >
>> > > >
>> > > >  > 2) C3 is connected C. Please tell me the connection host string
>> > passed
>> > > > to
>> > > >  >     this client. Does it contains all three servers info
>> > > "A:clientport,
>> > > >  >    B:clientport, C:clientport" ?
>> > > >
>> > > > Yes, C3's connection string contains all three servers.
>> ("A:clientport,
>> > > > B:clientport, C:clientport")
>> > > >
>> > > >
>> > > >  > 3) Please check all three servers and client C3 logs to see any
>> > > >  >    inconsistencies or exceptions.
>> > > >
>> > > > After looking at logs, it seems when the server C isolated from the
>> > > Leader,
>> > > >
>> > > > a disconnect event fires to client C3. Then it (C3) tries too much
>> time
>> > > to
>> > > > connect to Server B(Leader) .
>> > > >
>> > > > But it cannot connect to server B, as we blocked the connection
>> between
>> > > > Server C and
>> > > >
>> > > > Server B. Basically, C3 tries more than half of the session timeout
>> > time
>> > > to
>> > > > connect to Server B.
>> > > >
>> > > > Then after figuring out that C3 cannot to connect to Server B, it
>> tries
>> > > to
>> > > > connect
>> > > >
>> > > > to Server A, and it connects to Server A successfully. But this is
>> too
>> > > > late, because
>> > > >
>> > > > session is already expired at the time C3 connected.
>> > > >
>> > > > And this happens sometimes only. Because when we specify all the
>> > servers
>> > > in
>> > > > the client's
>> > > >
>> > > > connect string, sometimes after C3 disconnecting from Server C,
>> instead
>> > > of
>> > > > trying to connect to
>> > > >
>> > > > Server B it connects to Server A as the first attempt. In this case
>> the
>> > > > client C3 connects to the
>> > > >
>> > > > quorum successfully before the session expiration.
>> > > >
>> > > > According to my understanding, it looks like, when a client trying
>> to
>> > > > connect to a server that it cannot
>> > > >
>> > > > connect due to a network partitioning, it uses a blocking call and
>> it
>> > > waits
>> > > > too much time trying to
>> > > >
>> > > > connect to a server that it cannot communicate.
>> > > >
>> > > >
>> > > >
>> > > >  > 4) ZooKeeper version used in your testing ?
>> > > >
>> > > > I used zookeeper 3.4.9 (current stable release)
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <
>> > [hidden email]
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Could you please give few more details,
>> > > > >
>> > > > > ​​
>> > > > > 1) Could you tell me the status of Server C, is this lost
>> connection
>> > to
>> > > > the
>> > > > > quorum and fails to join quorum continuously as B is the Leader ?
>> > > > >
>> > > > > 2) C3 is connected C. Please tell me the connection host string
>> > passed
>> > > to
>> > > > > this client. Does it contains all three servers info
>> "A:clientport,
>> > > > > B:clientport, C:clientport" ?
>> > > > >
>> > > > > 3) Please check all three servers and client C3 logs to see any
>> > > > > inconsistencies or exceptions.
>> > > > >
>> > > > > 4) ZooKeeper version used in your testing ?
>> > > > >
>> > > > >
>> > > > > Rakesh
>> > > > >
>> > > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
>> > > > [hidden email]
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > ​Recently, carried out a test to to find the behavior of clients
>> > > when a
>> > > > > > client is partitioned from the ensemble.
>> > > > > >
>> > > > > > Here I used a ensemble of 3 zookeeper servers called A, B and C.
>> > And
>> > > > > quorum
>> > > > > > was set up like below.
>> > > > > >
>> > > > > > A - Follower
>> > > > > > B - Leader
>> > > > > > C - Follower​
>> > > > > >
>> > > > > > A  <---> B <---> C
>> > > > > >    \____________/
>> > > > > >
>> > > > > > And 3 clients are connected to ensemble like below.
>> > > > > >
>> > > > > > C1 is connected A
>> > > > > > C2 is connected B
>> > > > > > C3 is connected C.
>> > > > > >
>> > > > > > I used iptables to remove the network link between B and C.
>> > > > > >
>> > > > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
>> > > > > >
>> > > > > > After removing the link connections looks like below.
>> > > > > >
>> > > > > > A  <----> B         C
>> > > > > >    \____________/
>> > > > > >
>> > > > > > Simply there is no way to communicate from B to C and vice
>> versa.
>> > > > > >
>> > > > > > Here What I noticed is that the client connected to Zookeeper
>> > Server
>> > > > "C",
>> > > > > > could not connect to the ensemble resulting a session expiration
>> > > > timeout.
>> > > > > >
>> > > > > > For this experiment I used tickTime of 3000ms and client session
>> > > > > expiration
>> > > > > > timeout of 45000ms. And tested with different combinations also.
>> > > > > >
>> > > > > > Can someone please explain what is the root cause for this
>> > behavior?
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Rakesh Radhakrishnan-2
Sorry, it was a busy weekend for me. I hope the following will give more
clarity.

(1) Client connected to a server and sends heartbeat periodically
Send pings 1/3 the session timeout
https://github.com/apache/zookeeper/blob/branch-3.4/src/c/src/zookeeper.c#L1675
In your case, sessiontimeout=45secs, then approax 15secs is the ping
interval

(2) Client connected to a server and the connected server is unavailable.
Missing ping response. Usually, this time period could be (ping interval +
small grace period, for idle time calc).
This allows time to reconnect to a different server.

In your case, timeout 45 secs, client took 15 secs to get the disconnection
event from server C.
Then, it took 30 secs to hit connection error from server B. This is very
suspicious and ideally client should receive an error within 1/3 timeout +
small grace period.
Any chance to check the status of the client machine during this 30 secs
time period. Do you think Kernel message can help ?

Below is, the session "0x35a926acae80000" related log from the shared
client log.
>>> zookeeper_client: 2017-03-03
09:56:14,770:23078(0x7f9467786700):ZOO_INFO@check_events@1775: session
establishment complete on server [172.25.83.205:2181],
sessionId=0x35a926acae80000, negotiated timeout=45000
>>> zookeeper_client: 2017-03-03
09:56:30,770:23078(0x7f9467786700):ZOO_DEBUG@zookeeper_process@2218: Got
ping response in 1 ms
>>> zookeeper_client: 2017-03-03
09:56:42,213:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1746:
Socket [172.25.83.205:2181] zk retcode=-4, errno=112(Host is down): failed
while receiving aserver response
>>> zookeeper_client: 2017-03-03
09:57:12,220:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1666:
Socket [172.25.83.204:2181] zk retcode=-7, errno=110(Connection timed out):
connection to 172.25.83.204:2181 timed out (exceeded timeout by 6ms)
>>> zookeeper_client: 2017-03-03
09:57:12,221:23078(0x7f9467786700):ZOO_INFO@check_events@1728: initiated
connection to server [172.25.83.201:2181]
>>> zookeeper_client: 2017-03-03
09:57:12,223:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1764:
Socket [172.25.83.201:2181] zk retcode=-112, errno=116(Stale file handle):
sessionId=0x35a926acae80000 has expired.

Rakesh

On Mon, Mar 6, 2017 at 10:11 AM, Tharindu Kumara <[hidden email]>
wrote:

> Hi Rakesh,
>
> Any updates on this?
>
> Is this a bug or an expected situation?
>
> Thanks
>
>
>
> On Fri, Mar 3, 2017 at 10:49 AM, Tharindu Kumara <
> [hidden email]>
> wrote:
>
> > Hi Rakesh,
> >
> > I have made a small mistake in the above email.
> > For client's session timeout I have used 20000ms not 45000ms.
> > Sorry for the mistake.
> > Basically it waits for 28 seconds to connect to Server A. (14 seconds to
> > figure out disconnection and another 14 seconds to connect to Server B)
> >
> >
> >
> >
> > Then as soon as it connected to Server A, Server A sends the session
> > expiration.
> > Then I again did the same experiment with setting client's session
> > expiration time to 45000ms.
> > Then it took 15secs to retrieve the disconnection event to client.
> > After from that point, client took another 30secs to trying to connect to
> > Server B.
> > And then after that, client figures out that it cannot connect to Server
> B
> > and try to initiate a connection to Server A.
> > Since 15 + 30 secs is 45, when as soon as client connected to Server A,
> > Server A sends the session expiration event.
> > What is the reason for that? It looks like client is trying too much time
> > to connect to Server B.
> >
> >
> > Here I attached the Server B's log (Leader) and client's log.
> > Server A = 172.25.83.201
> > Server B = 172.25.83.204
> > Server C = 172.25.83.205
> > And client's session id is 0x35a926acae80000
> >
> > https://drive.google.com/file/d/0B9o9GJ_CoG1_
> > NWRTQ3hmeGpZRktrc040MkZzTk5LTDdpMi1v/view?usp=sharing
> > https://drive.google.com/file/d/0B9o9GJ_CoG1_
> > VHlKa092QzB6M3NPemRORk9Lc0w1a2N5ZDdz/view?usp=sharing
> >
> > And also in the above post, you mentioned that, you mentioned about a
> > connection-timeout.
> > Can you please explain a bit more?
> >
> > I see that when I try with both 45000ms and 20000ms, it takes around 14 -
> > 15 seconds to figure out a disconnection from clients end.
> > Looks like it is a constant and client has (session timeout - connection
> > loss notified time) to find a new Server. Am I correct here?
> > You mentioned that client connection timeout is equal to
> > ("sessiontimeout/listed servers count").
> > Please explain a bit about the client ping times and connection timeout.
> > How these to co-relate?
> >
> > On Thu, Mar 2, 2017 at 8:18 PM, Rakesh Radhakrishnan <[hidden email]
> >
> > wrote:
> >
> >> >>> You mentioned that a client sends a ping every 1/3 the session
> >> timeout.
> >> Yes, you are correct. Again to analyse your issue, we have to consider
> >> re-connection timeout also, which is "sessiontimeout/listed servers
> count"
> >> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
> >> /main/org/apache/zookeeper/ClientCnxn.java#L1292
> >> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
> >> /main/org/apache/zookeeper/ClientCnxn.java#L1098
> >>
> >> Coincidentally, in your example both heartbeat interval and
> re-connection
> >> interval are same as you have three servers.
> >>
> >> >>>>> It looks like C3 has taken 14 seconds to determine the
> disconnected
> >> event
> >> >>>>> and another 14 seconds to that it cannot connect to Server B(C3 is
> >> isolated
> >> >>>>> from B).
> >>
> >> With this info, total elapsed time is 28 secs which is less than 45 secs
> >> session timeout. Now, the client has 17 secs (45 secs - 28 secs) time
> >> period to re-establish a connection with server A, right? Could you
> please
> >> check whether the client is connecting to A during this period?
> >>
> >> Rakesh
> >>
> >>
> >>
> >> On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara <
> >> [hidden email]>
> >> wrote:
> >>
> >> > ​​
> >> > Hi Rakesh,
> >> >
> >> > First of all thank you for the quick reply.
> >> >
> >> > >>>>> Actually, ZooKeeper client has retry mechanism.
> >> > >>>>> Client sends a ping every 1/3 the session timeout (here, 3 is
> the
> >> no.
> >> > of listed servers, A, B, C) and then looks for a response before
> another
> >> > 1/3 elapses. This allows time to reconnect to a different server (and
> >> still
> >> > maintain the session) if the connected server becomes unavailable.
> >> >
> >> > You mentioned that a client sends a ping every 1/3 the session
> timeout.
> >> And
> >> > 3 is the no of listed servers.
> >> >
> >> > I doubt that. Because, I am using the C Binding and after inspecting
> the
> >> > code it looks like that 3 is a hard coded value.
> >> > Simply no matter what the number of clients, zk client biding is
> always
> >> > sending a ping every 1/3 session timeout.
> >> >
> >> > Can please clarify that for me?
> >> >
> >> > Here I used a tick of 3000ms and session expiration timeout of
> 45000ms.
> >> >
> >> > And please find the screenshot of extacted client log outout.
> >> >
> >> > https://anonimag.es/image/JT9htnL
> >> >
> >> > It looks like C3 has taken 14 seconds to determine the disconnected
> >> event
> >> > and another 14 seconds to that it cannot connect to Server B(C3 is
> >> isolated
> >> > from B).
> >> >
> >> >
> >> >
> >> > On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> > > >>>> According to my understanding, it looks like, when a client
> >> trying
> >> > to
> >> > > >>>> connect to a server that it cannot connect due to a network
> >> > > partitioning,
> >> > > >>>> it uses a blocking call and it waits too much time trying to
> >> > > >>>> connect to a server that it cannot communicate.
> >> > >
> >> > > Actually, ZooKeeper client has retry mechanism.
> >> > > Client sends a ping every 1/3 the session timeout (here, 3 is the
> no.
> >> of
> >> > > listed servers, A, B, C)
> >> > > and then looks for a response before another 1/3 elapses. This
> allows
> >> > time
> >> > > to reconnect to a
> >> > > different server (and still maintain the session) if the connected
> >> server
> >> > > becomes unavailable.
> >> > >
> >> > > Could you grep the following log message in your client log and tell
> >> me
> >> > how
> >> > > much time C3 taken for the re-connection attempts.
> >> > > "Client session timed out, have not heard from server in "
> >> > >
> >> > > C3 might have first attempted to reconnect to B and then A. Also,
> >> need to
> >> > > check how much time C3 taken to detect connection failure from
> server
> >> C.
> >> > >
> >> > > Could you please share the zk client log to dig more.
> >> > >
> >> > > Rakesh
> >> > >
> >> > >
> >> > > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
> >> > > [hidden email]>
> >> > > wrote:
> >> > >
> >> > > >  > ​
> >> > > > 1) Could you tell me the status of Server C, is this lost
> >> connection to
> >> > > the
> >> > > >  >     quorum and fails to join quorum continuously as B is the
> >> Leader
> >> > ?
> >> > > >
> >> > > > Yes, B the leader. C Server is completely isolated from the
> >> Leader(B)
> >> > > > and It cannot communicate with the Leader. C cannot continuously
> >> > connect
> >> > > to
> >> > > > the
> >> > > >
> >> > > > Leader.
> >> > > >
> >> > > >
> >> > > >  > 2) C3 is connected C. Please tell me the connection host string
> >> > passed
> >> > > > to
> >> > > >  >     this client. Does it contains all three servers info
> >> > > "A:clientport,
> >> > > >  >    B:clientport, C:clientport" ?
> >> > > >
> >> > > > Yes, C3's connection string contains all three servers.
> >> ("A:clientport,
> >> > > > B:clientport, C:clientport")
> >> > > >
> >> > > >
> >> > > >  > 3) Please check all three servers and client C3 logs to see any
> >> > > >  >    inconsistencies or exceptions.
> >> > > >
> >> > > > After looking at logs, it seems when the server C isolated from
> the
> >> > > Leader,
> >> > > >
> >> > > > a disconnect event fires to client C3. Then it (C3) tries too much
> >> time
> >> > > to
> >> > > > connect to Server B(Leader) .
> >> > > >
> >> > > > But it cannot connect to server B, as we blocked the connection
> >> between
> >> > > > Server C and
> >> > > >
> >> > > > Server B. Basically, C3 tries more than half of the session
> timeout
> >> > time
> >> > > to
> >> > > > connect to Server B.
> >> > > >
> >> > > > Then after figuring out that C3 cannot to connect to Server B, it
> >> tries
> >> > > to
> >> > > > connect
> >> > > >
> >> > > > to Server A, and it connects to Server A successfully. But this is
> >> too
> >> > > > late, because
> >> > > >
> >> > > > session is already expired at the time C3 connected.
> >> > > >
> >> > > > And this happens sometimes only. Because when we specify all the
> >> > servers
> >> > > in
> >> > > > the client's
> >> > > >
> >> > > > connect string, sometimes after C3 disconnecting from Server C,
> >> instead
> >> > > of
> >> > > > trying to connect to
> >> > > >
> >> > > > Server B it connects to Server A as the first attempt. In this
> case
> >> the
> >> > > > client C3 connects to the
> >> > > >
> >> > > > quorum successfully before the session expiration.
> >> > > >
> >> > > > According to my understanding, it looks like, when a client trying
> >> to
> >> > > > connect to a server that it cannot
> >> > > >
> >> > > > connect due to a network partitioning, it uses a blocking call and
> >> it
> >> > > waits
> >> > > > too much time trying to
> >> > > >
> >> > > > connect to a server that it cannot communicate.
> >> > > >
> >> > > >
> >> > > >
> >> > > >  > 4) ZooKeeper version used in your testing ?
> >> > > >
> >> > > > I used zookeeper 3.4.9 (current stable release)
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <
> >> > [hidden email]
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > Could you please give few more details,
> >> > > > >
> >> > > > > ​​
> >> > > > > 1) Could you tell me the status of Server C, is this lost
> >> connection
> >> > to
> >> > > > the
> >> > > > > quorum and fails to join quorum continuously as B is the Leader
> ?
> >> > > > >
> >> > > > > 2) C3 is connected C. Please tell me the connection host string
> >> > passed
> >> > > to
> >> > > > > this client. Does it contains all three servers info
> >> "A:clientport,
> >> > > > > B:clientport, C:clientport" ?
> >> > > > >
> >> > > > > 3) Please check all three servers and client C3 logs to see any
> >> > > > > inconsistencies or exceptions.
> >> > > > >
> >> > > > > 4) ZooKeeper version used in your testing ?
> >> > > > >
> >> > > > >
> >> > > > > Rakesh
> >> > > > >
> >> > > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> >> > > > [hidden email]
> >> > > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > ​Recently, carried out a test to to find the behavior of
> clients
> >> > > when a
> >> > > > > > client is partitioned from the ensemble.
> >> > > > > >
> >> > > > > > Here I used a ensemble of 3 zookeeper servers called A, B and
> C.
> >> > And
> >> > > > > quorum
> >> > > > > > was set up like below.
> >> > > > > >
> >> > > > > > A - Follower
> >> > > > > > B - Leader
> >> > > > > > C - Follower​
> >> > > > > >
> >> > > > > > A  <---> B <---> C
> >> > > > > >    \____________/
> >> > > > > >
> >> > > > > > And 3 clients are connected to ensemble like below.
> >> > > > > >
> >> > > > > > C1 is connected A
> >> > > > > > C2 is connected B
> >> > > > > > C3 is connected C.
> >> > > > > >
> >> > > > > > I used iptables to remove the network link between B and C.
> >> > > > > >
> >> > > > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> >> > > > > >
> >> > > > > > After removing the link connections looks like below.
> >> > > > > >
> >> > > > > > A  <----> B         C
> >> > > > > >    \____________/
> >> > > > > >
> >> > > > > > Simply there is no way to communicate from B to C and vice
> >> versa.
> >> > > > > >
> >> > > > > > Here What I noticed is that the client connected to Zookeeper
> >> > Server
> >> > > > "C",
> >> > > > > > could not connect to the ensemble resulting a session
> expiration
> >> > > > timeout.
> >> > > > > >
> >> > > > > > For this experiment I used tickTime of 3000ms and client
> session
> >> > > > > expiration
> >> > > > > > timeout of 45000ms. And tested with different combinations
> also.
> >> > > > > >
> >> > > > > > Can someone please explain what is the root cause for this
> >> > behavior?
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Sessions Expire due to Network partitioning in Zookeeper

Tharindu Kumara
Hi Rakesh,

Thank you so much for the reply.

The client program that I have used is a simple one which implements the
Leader Election. It creates a ephemeral node and watches it and once that
is deleted a new leader is selected.

Client Code
https://gist.github.com/tharindukumara/025c98ddbd406db34067a587c6d9533d

For this test, I deployed both the client and Server C in the same machine.


I think, you can also recreate the scenario, if you deploy the zookeeper
servers and clients as I mentioned.

This is the output of a gstack while executing the client program.

Thread 3 (Thread 0x7fa5a7a2d700 (LWP 9057)):
#0  0x00007fa5a7d27c1d in poll () from /lib64/libc.so.6
#1  0x00007fa5a88a34d8 in copy_watchers (clone=<optimized out>,
to=0x118e3b0, from=<optimized out>) at src/zk_hashtable.c:233
#2  add_for_event (ht=<optimized out>, path=<optimized out>,
list=<optimized out>) at src/zk_hashtable.c:265
#3  0x00007fa5a7c460f0 in [hidden email] () from
/lib64/libpthread.so.0
#4  0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7fa5a722c700 (LWP 9058)):
#0  0x00007fa5a7a3a05f in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007fa5a88a3686 in destroy_zk_hashtable (ht=0x118e514) at
src/zk_hashtable.c:143
#2  0x000000000118e3b0 in ?? ()
#3  0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fa5a8cad740 (LWP 9056)):
#0  0x00007fa5a7d01ded in nanosleep () from /lib64/libc.so.6
#1  0x00007fa5a7d01c84 in sleep () from /lib64/libc.so.6
#2  0x000000000040213c in main ()

Can you please give me a proper reason for this scenario?

To me it looks like when the client connects to server B, it initiates a
blocking call to Server B. And since the port is open to communicate it
waits on it too much time.

Thanks
Tharindu



On Mon, Mar 6, 2017 at 12:00 PM, Rakesh Radhakrishnan <[hidden email]>
wrote:

> Sorry, it was a busy weekend for me. I hope the following will give more
> clarity.
>
> (1) Client connected to a server and sends heartbeat periodically
> Send pings 1/3 the session timeout
> https://github.com/apache/zookeeper/blob/branch-3.4/src/c/
> src/zookeeper.c#L1675
> In your case, sessiontimeout=45secs, then approax 15secs is the ping
> interval
>
> (2) Client connected to a server and the connected server is unavailable.
> Missing ping response. Usually, this time period could be (ping interval +
> small grace period, for idle time calc).
> This allows time to reconnect to a different server.
>
> In your case, timeout 45 secs, client took 15 secs to get the disconnection
> event from server C.
> Then, it took 30 secs to hit connection error from server B. This is very
> suspicious and ideally client should receive an error within 1/3 timeout +
> small grace period.
> Any chance to check the status of the client machine during this 30 secs
> time period. Do you think Kernel message can help ?
>
> Below is, the session "0x35a926acae80000" related log from the shared
> client log.
> >>> zookeeper_client: 2017-03-03
> 09:56:14,770:23078(0x7f9467786700):ZOO_INFO@check_events@1775: session
> establishment complete on server [172.25.83.205:2181],
> sessionId=0x35a926acae80000, negotiated timeout=45000
> >>> zookeeper_client: 2017-03-03
> 09:56:30,770:23078(0x7f9467786700):ZOO_DEBUG@zookeeper_process@2218: Got
> ping response in 1 ms
> >>> zookeeper_client: 2017-03-03
> 09:56:42,213:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1746:
> Socket [172.25.83.205:2181] zk retcode=-4, errno=112(Host is down): failed
> while receiving aserver response
> >>> zookeeper_client: 2017-03-03
> 09:57:12,220:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1666:
> Socket [172.25.83.204:2181] zk retcode=-7, errno=110(Connection timed
> out):
> connection to 172.25.83.204:2181 timed out (exceeded timeout by 6ms)
> >>> zookeeper_client: 2017-03-03
> 09:57:12,221:23078(0x7f9467786700):ZOO_INFO@check_events@1728: initiated
> connection to server [172.25.83.201:2181]
> >>> zookeeper_client: 2017-03-03
> 09:57:12,223:23078(0x7f9467786700):ZOO_ERROR@handle_socket_error_msg@1764:
> Socket [172.25.83.201:2181] zk retcode=-112, errno=116(Stale file handle):
> sessionId=0x35a926acae80000 has expired.
>
> Rakesh
>
> On Mon, Mar 6, 2017 at 10:11 AM, Tharindu Kumara <
> [hidden email]>
> wrote:
>
> > Hi Rakesh,
> >
> > Any updates on this?
> >
> > Is this a bug or an expected situation?
> >
> > Thanks
> >
> >
> >
> > On Fri, Mar 3, 2017 at 10:49 AM, Tharindu Kumara <
> > [hidden email]>
> > wrote:
> >
> > > Hi Rakesh,
> > >
> > > I have made a small mistake in the above email.
> > > For client's session timeout I have used 20000ms not 45000ms.
> > > Sorry for the mistake.
> > > Basically it waits for 28 seconds to connect to Server A. (14 seconds
> to
> > > figure out disconnection and another 14 seconds to connect to Server B)
> > >
> > >
> > >
> > >
> > > Then as soon as it connected to Server A, Server A sends the session
> > > expiration.
> > > Then I again did the same experiment with setting client's session
> > > expiration time to 45000ms.
> > > Then it took 15secs to retrieve the disconnection event to client.
> > > After from that point, client took another 30secs to trying to connect
> to
> > > Server B.
> > > And then after that, client figures out that it cannot connect to
> Server
> > B
> > > and try to initiate a connection to Server A.
> > > Since 15 + 30 secs is 45, when as soon as client connected to Server A,
> > > Server A sends the session expiration event.
> > > What is the reason for that? It looks like client is trying too much
> time
> > > to connect to Server B.
> > >
> > >
> > > Here I attached the Server B's log (Leader) and client's log.
> > > Server A = 172.25.83.201
> > > Server B = 172.25.83.204
> > > Server C = 172.25.83.205
> > > And client's session id is 0x35a926acae80000
> > >
> > > https://drive.google.com/file/d/0B9o9GJ_CoG1_
> > > NWRTQ3hmeGpZRktrc040MkZzTk5LTDdpMi1v/view?usp=sharing
> > > https://drive.google.com/file/d/0B9o9GJ_CoG1_
> > > VHlKa092QzB6M3NPemRORk9Lc0w1a2N5ZDdz/view?usp=sharing
> > >
> > > And also in the above post, you mentioned that, you mentioned about a
> > > connection-timeout.
> > > Can you please explain a bit more?
> > >
> > > I see that when I try with both 45000ms and 20000ms, it takes around
> 14 -
> > > 15 seconds to figure out a disconnection from clients end.
> > > Looks like it is a constant and client has (session timeout -
> connection
> > > loss notified time) to find a new Server. Am I correct here?
> > > You mentioned that client connection timeout is equal to
> > > ("sessiontimeout/listed servers count").
> > > Please explain a bit about the client ping times and connection
> timeout.
> > > How these to co-relate?
> > >
> > > On Thu, Mar 2, 2017 at 8:18 PM, Rakesh Radhakrishnan <
> [hidden email]
> > >
> > > wrote:
> > >
> > >> >>> You mentioned that a client sends a ping every 1/3 the session
> > >> timeout.
> > >> Yes, you are correct. Again to analyse your issue, we have to consider
> > >> re-connection timeout also, which is "sessiontimeout/listed servers
> > count"
> > >> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
> > >> /main/org/apache/zookeeper/ClientCnxn.java#L1292
> > >> https://github.com/apache/zookeeper/blob/branch-3.4/src/java
> > >> /main/org/apache/zookeeper/ClientCnxn.java#L1098
> > >>
> > >> Coincidentally, in your example both heartbeat interval and
> > re-connection
> > >> interval are same as you have three servers.
> > >>
> > >> >>>>> It looks like C3 has taken 14 seconds to determine the
> > disconnected
> > >> event
> > >> >>>>> and another 14 seconds to that it cannot connect to Server B(C3
> is
> > >> isolated
> > >> >>>>> from B).
> > >>
> > >> With this info, total elapsed time is 28 secs which is less than 45
> secs
> > >> session timeout. Now, the client has 17 secs (45 secs - 28 secs) time
> > >> period to re-establish a connection with server A, right? Could you
> > please
> > >> check whether the client is connecting to A during this period?
> > >>
> > >> Rakesh
> > >>
> > >>
> > >>
> > >> On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara <
> > >> [hidden email]>
> > >> wrote:
> > >>
> > >> > ​​
> > >> > Hi Rakesh,
> > >> >
> > >> > First of all thank you for the quick reply.
> > >> >
> > >> > >>>>> Actually, ZooKeeper client has retry mechanism.
> > >> > >>>>> Client sends a ping every 1/3 the session timeout (here, 3 is
> > the
> > >> no.
> > >> > of listed servers, A, B, C) and then looks for a response before
> > another
> > >> > 1/3 elapses. This allows time to reconnect to a different server
> (and
> > >> still
> > >> > maintain the session) if the connected server becomes unavailable.
> > >> >
> > >> > You mentioned that a client sends a ping every 1/3 the session
> > timeout.
> > >> And
> > >> > 3 is the no of listed servers.
> > >> >
> > >> > I doubt that. Because, I am using the C Binding and after inspecting
> > the
> > >> > code it looks like that 3 is a hard coded value.
> > >> > Simply no matter what the number of clients, zk client biding is
> > always
> > >> > sending a ping every 1/3 session timeout.
> > >> >
> > >> > Can please clarify that for me?
> > >> >
> > >> > Here I used a tick of 3000ms and session expiration timeout of
> > 45000ms.
> > >> >
> > >> > And please find the screenshot of extacted client log outout.
> > >> >
> > >> > https://anonimag.es/image/JT9htnL
> > >> >
> > >> > It looks like C3 has taken 14 seconds to determine the disconnected
> > >> event
> > >> > and another 14 seconds to that it cannot connect to Server B(C3 is
> > >> isolated
> > >> > from B).
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan <
> > >> [hidden email]>
> > >> > wrote:
> > >> >
> > >> > > >>>> According to my understanding, it looks like, when a client
> > >> trying
> > >> > to
> > >> > > >>>> connect to a server that it cannot connect due to a network
> > >> > > partitioning,
> > >> > > >>>> it uses a blocking call and it waits too much time trying to
> > >> > > >>>> connect to a server that it cannot communicate.
> > >> > >
> > >> > > Actually, ZooKeeper client has retry mechanism.
> > >> > > Client sends a ping every 1/3 the session timeout (here, 3 is the
> > no.
> > >> of
> > >> > > listed servers, A, B, C)
> > >> > > and then looks for a response before another 1/3 elapses. This
> > allows
> > >> > time
> > >> > > to reconnect to a
> > >> > > different server (and still maintain the session) if the connected
> > >> server
> > >> > > becomes unavailable.
> > >> > >
> > >> > > Could you grep the following log message in your client log and
> tell
> > >> me
> > >> > how
> > >> > > much time C3 taken for the re-connection attempts.
> > >> > > "Client session timed out, have not heard from server in "
> > >> > >
> > >> > > C3 might have first attempted to reconnect to B and then A. Also,
> > >> need to
> > >> > > check how much time C3 taken to detect connection failure from
> > server
> > >> C.
> > >> > >
> > >> > > Could you please share the zk client log to dig more.
> > >> > >
> > >> > > Rakesh
> > >> > >
> > >> > >
> > >> > > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara <
> > >> > > [hidden email]>
> > >> > > wrote:
> > >> > >
> > >> > > >  > ​
> > >> > > > 1) Could you tell me the status of Server C, is this lost
> > >> connection to
> > >> > > the
> > >> > > >  >     quorum and fails to join quorum continuously as B is the
> > >> Leader
> > >> > ?
> > >> > > >
> > >> > > > Yes, B the leader. C Server is completely isolated from the
> > >> Leader(B)
> > >> > > > and It cannot communicate with the Leader. C cannot continuously
> > >> > connect
> > >> > > to
> > >> > > > the
> > >> > > >
> > >> > > > Leader.
> > >> > > >
> > >> > > >
> > >> > > >  > 2) C3 is connected C. Please tell me the connection host
> string
> > >> > passed
> > >> > > > to
> > >> > > >  >     this client. Does it contains all three servers info
> > >> > > "A:clientport,
> > >> > > >  >    B:clientport, C:clientport" ?
> > >> > > >
> > >> > > > Yes, C3's connection string contains all three servers.
> > >> ("A:clientport,
> > >> > > > B:clientport, C:clientport")
> > >> > > >
> > >> > > >
> > >> > > >  > 3) Please check all three servers and client C3 logs to see
> any
> > >> > > >  >    inconsistencies or exceptions.
> > >> > > >
> > >> > > > After looking at logs, it seems when the server C isolated from
> > the
> > >> > > Leader,
> > >> > > >
> > >> > > > a disconnect event fires to client C3. Then it (C3) tries too
> much
> > >> time
> > >> > > to
> > >> > > > connect to Server B(Leader) .
> > >> > > >
> > >> > > > But it cannot connect to server B, as we blocked the connection
> > >> between
> > >> > > > Server C and
> > >> > > >
> > >> > > > Server B. Basically, C3 tries more than half of the session
> > timeout
> > >> > time
> > >> > > to
> > >> > > > connect to Server B.
> > >> > > >
> > >> > > > Then after figuring out that C3 cannot to connect to Server B,
> it
> > >> tries
> > >> > > to
> > >> > > > connect
> > >> > > >
> > >> > > > to Server A, and it connects to Server A successfully. But this
> is
> > >> too
> > >> > > > late, because
> > >> > > >
> > >> > > > session is already expired at the time C3 connected.
> > >> > > >
> > >> > > > And this happens sometimes only. Because when we specify all the
> > >> > servers
> > >> > > in
> > >> > > > the client's
> > >> > > >
> > >> > > > connect string, sometimes after C3 disconnecting from Server C,
> > >> instead
> > >> > > of
> > >> > > > trying to connect to
> > >> > > >
> > >> > > > Server B it connects to Server A as the first attempt. In this
> > case
> > >> the
> > >> > > > client C3 connects to the
> > >> > > >
> > >> > > > quorum successfully before the session expiration.
> > >> > > >
> > >> > > > According to my understanding, it looks like, when a client
> trying
> > >> to
> > >> > > > connect to a server that it cannot
> > >> > > >
> > >> > > > connect due to a network partitioning, it uses a blocking call
> and
> > >> it
> > >> > > waits
> > >> > > > too much time trying to
> > >> > > >
> > >> > > > connect to a server that it cannot communicate.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >  > 4) ZooKeeper version used in your testing ?
> > >> > > >
> > >> > > > I used zookeeper 3.4.9 (current stable release)
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan <
> > >> > [hidden email]
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > Could you please give few more details,
> > >> > > > >
> > >> > > > > ​​
> > >> > > > > 1) Could you tell me the status of Server C, is this lost
> > >> connection
> > >> > to
> > >> > > > the
> > >> > > > > quorum and fails to join quorum continuously as B is the
> Leader
> > ?
> > >> > > > >
> > >> > > > > 2) C3 is connected C. Please tell me the connection host
> string
> > >> > passed
> > >> > > to
> > >> > > > > this client. Does it contains all three servers info
> > >> "A:clientport,
> > >> > > > > B:clientport, C:clientport" ?
> > >> > > > >
> > >> > > > > 3) Please check all three servers and client C3 logs to see
> any
> > >> > > > > inconsistencies or exceptions.
> > >> > > > >
> > >> > > > > 4) ZooKeeper version used in your testing ?
> > >> > > > >
> > >> > > > >
> > >> > > > > Rakesh
> > >> > > > >
> > >> > > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara <
> > >> > > > [hidden email]
> > >> > > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > ​Recently, carried out a test to to find the behavior of
> > clients
> > >> > > when a
> > >> > > > > > client is partitioned from the ensemble.
> > >> > > > > >
> > >> > > > > > Here I used a ensemble of 3 zookeeper servers called A, B
> and
> > C.
> > >> > And
> > >> > > > > quorum
> > >> > > > > > was set up like below.
> > >> > > > > >
> > >> > > > > > A - Follower
> > >> > > > > > B - Leader
> > >> > > > > > C - Follower​
> > >> > > > > >
> > >> > > > > > A  <---> B <---> C
> > >> > > > > >    \____________/
> > >> > > > > >
> > >> > > > > > And 3 clients are connected to ensemble like below.
> > >> > > > > >
> > >> > > > > > C1 is connected A
> > >> > > > > > C2 is connected B
> > >> > > > > > C3 is connected C.
> > >> > > > > >
> > >> > > > > > I used iptables to remove the network link between B and C.
> > >> > > > > >
> > >> > > > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP
> > >> > > > > >
> > >> > > > > > After removing the link connections looks like below.
> > >> > > > > >
> > >> > > > > > A  <----> B         C
> > >> > > > > >    \____________/
> > >> > > > > >
> > >> > > > > > Simply there is no way to communicate from B to C and vice
> > >> versa.
> > >> > > > > >
> > >> > > > > > Here What I noticed is that the client connected to
> Zookeeper
> > >> > Server
> > >> > > > "C",
> > >> > > > > > could not connect to the ensemble resulting a session
> > expiration
> > >> > > > timeout.
> > >> > > > > >
> > >> > > > > > For this experiment I used tickTime of 3000ms and client
> > session
> > >> > > > > expiration
> > >> > > > > > timeout of 45000ms. And tested with different combinations
> > also.
> > >> > > > > >
> > >> > > > > > Can someone please explain what is the root cause for this
> > >> > behavior?
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>