Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

mark harwood
First a quick thanks for releasing this project - very useful.

I've had success working with the sourceforge version (2.2.1) and just tried moving to the Apache SVN trunk version and found the servers fail to find each other.

My test environment has 3 zookeeper servers all running on the same machine, started from the command line in different directories.
I changed my startup batch files to run QuorumPeerMain in place of conf QuorumPeer, wiped the data directories (keeping the "myid" files) and used the previous zoo.cfg files (an example below).

#########  Server 1 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2181
electionPort=2881
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 2 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2182
electionPort=2882
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 3 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2183
electionPort=2883
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

Firing up each server, they all hang with the following output

D:\tmp\Zookeeper3Servers\server2>java -cp lib\zookeeper-dev.jar;lib\log4j-1.2.15
.jar;conf org.apache.zookeeper.server.quorum.QuorumPeerMain conf/zoo.cfg
INFO  - [QuorumPeer:QuorumPeer@379] - LOOKING
WARN  - [QuorumPeer:FastLeaderElection@493] - New election: 0

I tried firing up one of the servers from Eclipse in debug mode  and it appeared to loop around FastLeaderElection.lookForLeader().

While poking around in the debugger I also noticed that in QuorumCnxManager.toSend this test failed:
    if (addr.equals(localIP))
..because addr was held as "localhost/127.0.0.1" and localIP was held as my 10.20.x.x address on the local network.
I tried changing the zoo.cfg files to the 10.20.x.x address and this made the above "if" statement evaluate to true but the end result was the same - servers failing to connect.

If it helps, the logging from my sourceforge 2.2.1 run of the above config produces the following and works fine:

D:\servers\IeIncrementalIndexingTests\ZookeeperServers\server3>java -cp lib\zook
eeper-dev.jar;lib\log4j-1.2.15.jar;conf com.yahoo.zookeeper.server.quorum.Quorum
Peer conf/zoo.cfg
WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 1      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 1
WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 2
WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
WARN  - [QuorumPeer:Follower@124] - Following localhost/127.0.0.1:2882
WARN  - [QuorumPeer:Follower@171] - Getting a snapshot from leader
WARN  - [NIOServerCxn.Factory:NIOServerCnxn@471] - Connected to /127.0.0.1:2375
lastZxid 0
WARN  - [NIOServerCxn.Factory:NIOServerCnxn@500] - Creating new session 31c03d95
1fe0000
WARN  - [QuorumPeer:Follower@219] - Got zxid 100000001 expected 1
WARN  - [SyncThread:Profiler@34] - Elapsed 10717 ms: Logfile padding exceeded ti
me threshold
WARN  - [Thread-0:NIOServerCnxn@774] - Finished init of 31c03d951fe0000: true

This looks to be using a different leader election algo.

Any ideas?
Cheers,
Mark


Send instant messages to your online friends http://uk.messenger.yahoo.com
Reply | Threaded
Open this post in threaded view
|

RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

Flavio Junqueira
Mark, Please use a port for electionPort different from the one you're using
in the server configuration.

Thanks,
-Flavio

> -----Original Message-----
> From: mark harwood [mailto:[hidden email]]
> Sent: Wednesday, August 27, 2008 1:12 PM
> To: [hidden email]
> Subject: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers
> failing to find each other
>
> First a quick thanks for releasing this project - very useful.
>
> I've had success working with the sourceforge version (2.2.1) and just
> tried moving to the Apache SVN trunk version and found the servers fail to
> find each other.
>
> My test environment has 3 zookeeper servers all running on the same
> machine, started from the command line in different directories.
> I changed my startup batch files to run QuorumPeerMain in place of conf
> QuorumPeer, wiped the data directories (keeping the "myid" files) and used
> the previous zoo.cfg files (an example below).
>
> #########  Server 1 ##################
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=data
> clientPort=2181
> electionPort=2881
> server.1=localhost:2881
> server.2=localhost:2882
> server.3=localhost:2883
>
> #########  Server 2 ##################
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=data
> clientPort=2182
> electionPort=2882
> server.1=localhost:2881
> server.2=localhost:2882
> server.3=localhost:2883
>
> #########  Server 3 ##################
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=data
> clientPort=2183
> electionPort=2883
> server.1=localhost:2881
> server.2=localhost:2882
> server.3=localhost:2883
>
> Firing up each server, they all hang with the following output
>
> D:\tmp\Zookeeper3Servers\server2>java -cp lib\zookeeper-dev.jar;lib\log4j-
> 1.2.15
> .jar;conf org.apache.zookeeper.server.quorum.QuorumPeerMain conf/zoo.cfg
> INFO  - [QuorumPeer:QuorumPeer@379] - LOOKING
> WARN  - [QuorumPeer:FastLeaderElection@493] - New election: 0
>
> I tried firing up one of the servers from Eclipse in debug mode  and it
> appeared to loop around FastLeaderElection.lookForLeader().
>
> While poking around in the debugger I also noticed that in
> QuorumCnxManager.toSend this test failed:
>     if (addr.equals(localIP))
> ..because addr was held as "localhost/127.0.0.1" and localIP was held as
> my 10.20.x.x address on the local network.
> I tried changing the zoo.cfg files to the 10.20.x.x address and this made
> the above "if" statement evaluate to true but the end result was the same
> - servers failing to connect.
>
> If it helps, the logging from my sourceforge 2.2.1 run of the above config
> produces the following and works fine:
>
> D:\servers\IeIncrementalIndexingTests\ZookeeperServers\server3>java -cp
> lib\zook
> eeper-dev.jar;lib\log4j-1.2.15.jar;conf
> com.yahoo.zookeeper.server.quorum.Quorum
> Peer conf/zoo.cfg
> WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
> WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
> WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
> WARN  - [QuorumPeer:LeaderElection@95] - 1      -> 1
> WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 1
> WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
> WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
> WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 2
> WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
> WARN  - [QuorumPeer:Follower@124] - Following localhost/127.0.0.1:2882
> WARN  - [QuorumPeer:Follower@171] - Getting a snapshot from leader
> WARN  - [NIOServerCxn.Factory:NIOServerCnxn@471] - Connected to
> /127.0.0.1:2375
> lastZxid 0
> WARN  - [NIOServerCxn.Factory:NIOServerCnxn@500] - Creating new session
> 31c03d95
> 1fe0000
> WARN  - [QuorumPeer:Follower@219] - Got zxid 100000001 expected 1
> WARN  - [SyncThread:Profiler@34] - Elapsed 10717 ms: Logfile padding
> exceeded ti
> me threshold
> WARN  - [Thread-0:NIOServerCnxn@774] - Finished init of 31c03d951fe0000:
> true
>
> This looks to be using a different leader election algo.
>
> Any ideas?
> Cheers,
> Mark
>
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com

Reply | Threaded
Open this post in threaded view
|

Re: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

mark harwood
In reply to this post by mark harwood
After some further analysis I think I have found a bug.

In QuorumCnxManager.toSend there is a call to create a connection as follows:
    channel = SocketChannel.open(new InetSocketAddress(addr, port));

Unfortunately "addr" is the ip address of a remote server while "port" is the electionPort of *this* server.
As an example, given this configuration (taken from my zoo.cfg)
  server.1=10.20.9.254:2881
  server.2=10.20.9.9:2882
  server.3=10.20.9.254:2883
Server 3 was observed trying to make a connection to host 10.20.9.9 on port 2883 and obviously failing.

In tests where all machines use the same electionPort this bug would not manifest itself.

Cheers,
Mark







----- Original Message ----
From: mark harwood <[hidden email]>
To: [hidden email]
Sent: Wednesday, 27 August, 2008 12:11:58
Subject: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

First a quick thanks for releasing this project - very useful.

I've had success working with the sourceforge version (2.2.1) and just tried moving to the Apache SVN trunk version and found the servers fail to find each other.

My test environment has 3 zookeeper servers all running on the same machine, started from the command line in different directories.
I changed my startup batch files to run QuorumPeerMain in place of conf QuorumPeer, wiped the data directories (keeping the "myid" files) and used the previous zoo.cfg files (an example below).

#########  Server 1 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2181
electionPort=2881
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 2 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2182
electionPort=2882
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

#########  Server 3 ##################
tickTime=2000
initLimit=10
syncLimit=5
dataDir=data
clientPort=2183
electionPort=2883
server.1=localhost:2881
server.2=localhost:2882
server.3=localhost:2883

Firing up each server, they all hang with the following output

D:\tmp\Zookeeper3Servers\server2>java -cp lib\zookeeper-dev.jar;lib\log4j-1.2.15
.jar;conf org.apache.zookeeper.server.quorum.QuorumPeerMain conf/zoo.cfg
INFO  - [QuorumPeer:QuorumPeer@379] - LOOKING
WARN  - [QuorumPeer:FastLeaderElection@493] - New election: 0

I tried firing up one of the servers from Eclipse in debug mode  and it appeared to loop around FastLeaderElection.lookForLeader().

While poking around in the debugger I also noticed that in QuorumCnxManager.toSend this test failed:
    if (addr.equals(localIP))
..because addr was held as "localhost/127.0.0.1" and localIP was held as my 10.20.x.x address on the local network.
I tried changing the zoo.cfg files to the 10.20.x.x address and this made the above "if" statement evaluate to true but the end result was the same - servers failing to connect.

If it helps, the logging from my sourceforge 2.2.1 run of the above config produces the following and works fine:

D:\servers\IeIncrementalIndexingTests\ZookeeperServers\server3>java -cp lib\zook
eeper-dev.jar;lib\log4j-1.2.15.jar;conf com.yahoo.zookeeper.server.quorum.Quorum
Peer conf/zoo.cfg
WARN  - [QuorumPeer:QuorumPeer@388] - LOOKING
WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 1      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 1
WARN  - [QuorumPeer:LeaderElection@89] - Election tally:
WARN  - [QuorumPeer:LeaderElection@95] - 3      -> 1
WARN  - [QuorumPeer:LeaderElection@95] - 2      -> 2
WARN  - [QuorumPeer:QuorumPeer@397] - FOLLOWING
WARN  - [QuorumPeer:Follower@124] - Following localhost/127.0.0.1:2882
WARN  - [QuorumPeer:Follower@171] - Getting a snapshot from leader
WARN  - [NIOServerCxn.Factory:NIOServerCnxn@471] - Connected to /127.0.0.1:2375
lastZxid 0
WARN  - [NIOServerCxn.Factory:NIOServerCnxn@500] - Creating new session 31c03d95
1fe0000
WARN  - [QuorumPeer:Follower@219] - Got zxid 100000001 expected 1
WARN  - [SyncThread:Profiler@34] - Elapsed 10717 ms: Logfile padding exceeded ti
me threshold
WARN  - [Thread-0:NIOServerCnxn@774] - Finished init of 31c03d951fe0000: true

This looks to be using a different leader election algo.

Any ideas?
Cheers,
Mark


Send instant messages to your online friends http://uk.messenger.yahoo.com


Send instant messages to your online friends http://uk.messenger.yahoo.com
Reply | Threaded
Open this post in threaded view
|

RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

mark harwood
In reply to this post by mark harwood
>>Please use a port for electionPort different from the one you're using in the server configuration.

I think I am getting confused with the range of port numbers that must be defined. I had assumed there were only 2 types - clientPort and electionPort representing the client-server comms and the server-server comms respectively as shown in the overview diagram below:
    http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription

It sounds like there may be another type of port to deal with - is this right?

I previously added a comment about electionPorts to the Wiki documentation here ( http://zookeeper.wiki.sourceforge.net/ZooKeeperGettingStarted ) to clarify my understanding of the config settings.
While this interpretation works OK in sourceforge 2.2.1 I am now confused as to the arrangement in Apache 3.0. I spent most of yesterday debugging and trying different configuration files. Sourceforge version worked fine but when I flipped the jars to the Apache version (keeping the same zoo config files) it just wouldn't work -either running on a single machine or multiples. Sometimes this was because the Apache version was using the wrong port to try talk to another machine (when I configured each server with different election port settings) and sometimes a single server would get a BindException trying to open the same ServerSocket twice.

I suspect this may be down to my misunderstanding of the ports now used and a change since the sourceforge version. Can you cast any more light on this?

I'd also be keen to get some advice on whether to go with sourceforge 2.2.1 or Apache 3.x for an upcoming deployment to a live system. I imagine ZK 3 may be a bit of a moving target but is more likely to get bug-fixed than zk 2.2.1?

Many thanks,
Mark

Send instant messages to your online friends http://uk.messenger.yahoo.com
Reply | Threaded
Open this post in threaded view
|

RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

Flavio Junqueira
With the new leader election, we require a third port. So, there is the
clientPort, there is the port servers use for communication upon regular
ZooKeeper operation, and a third port for leader election among the
ZooKeeper servers. The previous leader election algorithm used UDP, and the
regular communication among servers use TCP. Since the new leader election
uses TCP, we had the choice of either integrating leader election with the
regular quorum protocol, or to keep it separate and require another port
from each server. We kept them separate to simplify the implementation, but
we have been discussing merging them at some point. Once we merge, we'll
require one port, but it will take some restructuring of the code.

The figure you pointed out on the wiki reflects earlier versions of
ZooKeeper. It is probably a good idea to update, or at least clarify points
like yours.

Hope it is clear now.

-Flavio


> -----Original Message-----
> From: mark harwood [mailto:[hidden email]]
> Sent: Thursday, August 28, 2008 11:55 AM
> To: [hidden email]
> Subject: RE: Migrating from sourceforge 2.2.1 to Apache trunk -
> QuorumPeers failing to find each other
>
> >>Please use a port for electionPort different from the one you're using
> in the server configuration.
>
> I think I am getting confused with the range of port numbers that must be
> defined. I had assumed there were only 2 types - clientPort and
> electionPort representing the client-server comms and the server-server
> comms respectively as shown in the overview diagram below:
>     http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription
>
> It sounds like there may be another type of port to deal with - is this
> right?
>
> I previously added a comment about electionPorts to the Wiki documentation
> here ( http://zookeeper.wiki.sourceforge.net/ZooKeeperGettingStarted ) to
> clarify my understanding of the config settings.
> While this interpretation works OK in sourceforge 2.2.1 I am now confused
> as to the arrangement in Apache 3.0. I spent most of yesterday debugging
> and trying different configuration files. Sourceforge version worked fine
> but when I flipped the jars to the Apache version (keeping the same zoo
> config files) it just wouldn't work -either running on a single machine or
> multiples. Sometimes this was because the Apache version was using the
> wrong port to try talk to another machine (when I configured each server
> with different election port settings) and sometimes a single server would
> get a BindException trying to open the same ServerSocket twice.
>
> I suspect this may be down to my misunderstanding of the ports now used
> and a change since the sourceforge version. Can you cast any more light on
> this?
>
> I'd also be keen to get some advice on whether to go with sourceforge
> 2.2.1 or Apache 3.x for an upcoming deployment to a live system. I imagine
> ZK 3 may be a bit of a moving target but is more likely to get bug-fixed
> than zk 2.2.1?
>
> Many thanks,
> Mark
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com

Reply | Threaded
Open this post in threaded view
|

RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

Benjamin Reed-2
In reply to this post by mark harwood
I think I understand your confusion Mark. There are actually three ports
used. It's always been this way, but there was a trick we could use to
avoid requiring the third port in the configuration file. Let me go
through the ports and I think it may become clear.

The first port is the "client port". Clients of ZooKeeper connect to
this TCP port.

The second port is the "quorum port" (not the greatest name). The
ZooKeeper servers communicate with each other using this TCP port to
process state changes.

The third port is the "leader election port". ZooKeeper servers use this
port to communicate with each other to elect a leader.

Now a couple of questions need to be answered:

Q. Why are there a quorum port and a leader election port. Since both
are used for server to server communication wouldn't it be better to use
just one?

A. Yes it would be better. Eventually, we would like to make it that
way. The difficulty comes from the different communication topologies in
the two cases. In processing state changes we have a star topology. All
servers connect to a leader to send and receive changes. For leader
election we need a full mesh since we do not have a leader so everyone
needs to talk to everyone else. Since the protocols are different and
the topologies are different it is easy to just write them as two
completely separate pieces of code.

Q. Why did I not have to specify the election port in the sourceforge
releases? What is this "trick to avoid specifying the election port in
the config file"?

A. As it turns out, there are a couple of versions of leader election.
The default version on sourceforge was UDP based. Because the UDP and
TCP have different port namespaces, we could use the same port number
for both, so we use the quorum port specified in the config file for
both updates and leader election. On Apache we changed the default to a
TCP based leader election. (It's faster and deals with firewalls
better.) When leader election uses TCP, we can't use our trick anymore
and we need another port number for leader election.

Does this make sense?

Unfortunately the transition to Apache has taken a long time. We
probably will not have a stable release for a couple more weeks. (Unlike
sourceforge we cannot decide a release is ready and push it out that
evening. Apache has a much more involved process.) Future development
will take place on Apache. There is a bug with sync() that we want to
fix on sourceforge and do another release, but I don't expect there will
be anymore releases after that on sourceforge.

If you need leader election to run on different ports, until
ZOOKEEPER-127 is fixed you can use the configuration file to set the
leader election algorithm to 0. That was the default on sourceforge.

Thanx
ben
-----Original Message-----
From: mark harwood [mailto:[hidden email]]
Sent: Thursday, August 28, 2008 2:55 AM
To: [hidden email]
Subject: RE: Migrating from sourceforge 2.2.1 to Apache trunk -
QuorumPeers failing to find each other

>>Please use a port for electionPort different from the one you're using
in the server configuration.

I think I am getting confused with the range of port numbers that must
be defined. I had assumed there were only 2 types - clientPort and
electionPort representing the client-server comms and the server-server
comms respectively as shown in the overview diagram below:
    http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription

It sounds like there may be another type of port to deal with - is this
right?

I previously added a comment about electionPorts to the Wiki
documentation here (
http://zookeeper.wiki.sourceforge.net/ZooKeeperGettingStarted ) to
clarify my understanding of the config settings.
While this interpretation works OK in sourceforge 2.2.1 I am now
confused as to the arrangement in Apache 3.0. I spent most of yesterday
debugging and trying different configuration files. Sourceforge version
worked fine but when I flipped the jars to the Apache version (keeping
the same zoo config files) it just wouldn't work -either running on a
single machine or multiples. Sometimes this was because the Apache
version was using the wrong port to try talk to another machine (when I
configured each server with different election port settings) and
sometimes a single server would get a BindException trying to open the
same ServerSocket twice.

I suspect this may be down to my misunderstanding of the ports now used
and a change since the sourceforge version. Can you cast any more light
on this?

I'd also be keen to get some advice on whether to go with sourceforge
2.2.1 or Apache 3.x for an upcoming deployment to a live system. I
imagine ZK 3 may be a bit of a moving target but is more likely to get
bug-fixed than zk 2.2.1?

Many thanks,
Mark

Send instant messages to your online friends
http://uk.messenger.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

mark harwood
In reply to this post by mark harwood
Flavio, Ben, many thanks for your detailed response.

This tallies with the further investigations I have been making into this. Slowly the mists are clearing...
From an administrator's point of view 2 ports would be less prone to error than configuring 3 but I can see the rationale for the way it is currently.

I plan to put together a patch for ZOOKEEPER-127 where the config file contains the changes you described, Ben which allow quoromPort AND lePort to be stored with the server details. I think this may not be simply isolated to QuorumPeerConfig but we'll see.

Cheers
Mark




----- Original Message ----
From: Benjamin Reed <[hidden email]>
To: [hidden email]
Sent: Thursday, 28 August, 2008 15:00:25
Subject: RE: Migrating from sourceforge 2.2.1 to Apache trunk - QuorumPeers failing to find each other

I think I understand your confusion Mark. There are actually three ports
used. It's always been this way, but there was a trick we could use to
avoid requiring the third port in the configuration file. Let me go
through the ports and I think it may become clear.

The first port is the "client port". Clients of ZooKeeper connect to
this TCP port.

The second port is the "quorum port" (not the greatest name). The
ZooKeeper servers communicate with each other using this TCP port to
process state changes.

The third port is the "leader election port". ZooKeeper servers use this
port to communicate with each other to elect a leader.

Now a couple of questions need to be answered:

Q. Why are there a quorum port and a leader election port. Since both
are used for server to server communication wouldn't it be better to use
just one?

A. Yes it would be better. Eventually, we would like to make it that
way. The difficulty comes from the different communication topologies in
the two cases. In processing state changes we have a star topology. All
servers connect to a leader to send and receive changes. For leader
election we need a full mesh since we do not have a leader so everyone
needs to talk to everyone else. Since the protocols are different and
the topologies are different it is easy to just write them as two
completely separate pieces of code.

Q. Why did I not have to specify the election port in the sourceforge
releases? What is this "trick to avoid specifying the election port in
the config file"?

A. As it turns out, there are a couple of versions of leader election.
The default version on sourceforge was UDP based. Because the UDP and
TCP have different port namespaces, we could use the same port number
for both, so we use the quorum port specified in the config file for
both updates and leader election. On Apache we changed the default to a
TCP based leader election. (It's faster and deals with firewalls
better.) When leader election uses TCP, we can't use our trick anymore
and we need another port number for leader election.

Does this make sense?

Unfortunately the transition to Apache has taken a long time. We
probably will not have a stable release for a couple more weeks. (Unlike
sourceforge we cannot decide a release is ready and push it out that
evening. Apache has a much more involved process.) Future development
will take place on Apache. There is a bug with sync() that we want to
fix on sourceforge and do another release, but I don't expect there will
be anymore releases after that on sourceforge.

If you need leader election to run on different ports, until
ZOOKEEPER-127 is fixed you can use the configuration file to set the
leader election algorithm to 0. That was the default on sourceforge.

Thanx
ben
-----Original Message-----
From: mark harwood [mailto:[hidden email]]
Sent: Thursday, August 28, 2008 2:55 AM
To: [hidden email]
Subject: RE: Migrating from sourceforge 2.2.1 to Apache trunk -
QuorumPeers failing to find each other

>>Please use a port for electionPort different from the one you're using
in the server configuration.

I think I am getting confused with the range of port numbers that must
be defined. I had assumed there were only 2 types - clientPort and
electionPort representing the client-server comms and the server-server
comms respectively as shown in the overview diagram below:
    http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription

It sounds like there may be another type of port to deal with - is this
right?

I previously added a comment about electionPorts to the Wiki
documentation here (
http://zookeeper.wiki.sourceforge.net/ZooKeeperGettingStarted ) to
clarify my understanding of the config settings.
While this interpretation works OK in sourceforge 2.2.1 I am now
confused as to the arrangement in Apache 3.0. I spent most of yesterday
debugging and trying different configuration files. Sourceforge version
worked fine but when I flipped the jars to the Apache version (keeping
the same zoo config files) it just wouldn't work -either running on a
single machine or multiples. Sometimes this was because the Apache
version was using the wrong port to try talk to another machine (when I
configured each server with different election port settings) and
sometimes a single server would get a BindException trying to open the
same ServerSocket twice.

I suspect this may be down to my misunderstanding of the ports now used
and a change since the sourceforge version. Can you cast any more light
on this?

I'd also be keen to get some advice on whether to go with sourceforge
2.2.1 or Apache 3.x for an upcoming deployment to a live system. I
imagine ZK 3 may be a bit of a moving target but is more likely to get
bug-fixed than zk 2.2.1?

Many thanks,
Mark

Send instant messages to your online friends
http://uk.messenger.yahoo.com 


Send instant messages to your online friends http://uk.messenger.yahoo.com