New to zookeeper

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

New to zookeeper

Luigi Tagliamonte
Hello, Zookeeper Users!
I'm currently configuring/exploring zookeeper.
I'm reading a lot about ensembles and scaling and I got some question that
I'd like to submit to an expert audience.
I need zookeeper as Kafka dependency so my deployment goal is the ensemble
reliability especially because last Kafka version uses zookeeper only to
store the leader partition.

Here are my questions:

- To manage the ensemble I decided to use exhibitor - what do you think
about? Should I look to something else?

- Is there a way to discover all the servers of an ensemble apart from
use 4LTR? I wonder if it is possible to do something like in Cassandra were
you contact one node and you can get the whole cluster info from it. should
I configure just a DNS per zookeeper server, this doesn't scale well in a
dynamic env like servers in autoscaling.

- is there any white paper that shows a real scalable and reliable
Zookeeper installation? Any resources are welcome!

Thank you all in advance!
Regards
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New to zookeeper

Washko, Daniel
I speak strictly from my experience with Zookeeper and not an any official capacity of the project or of exhibitor.

Exhibitor works great and allows you to easily automate clustering zookeeper nodes into an ensemble and discovering the individual nodes in the ensemble via an http call. We ran into a problem, though, after we implemented Exhibitor across our infrastructure. Every so often our Zookeeper ensembles lost the data they stored. While I cannot say this was caused by Exhibitor, we have Solr clouds where Exhibitor was not used and they never had this problem. My suspicion is that there was a problem with a zookeeper node and Exhibitor removed that node from the ensemble then did a rolling restart. When that node recovered for some reason the data was corrupted or lost. Exhibitor pulled that node back into the ensemble and did a rolling restart. That node became leader and when the others joined synced from that. Those nodes then dumped their data stored to be in sync with the leader. This is my speculation, I have had a very hard time replicating this and have not heard of anyone else having this problem. Again, I am not definitively saying Exhibitor is the cause of this but since we removed Exhibitor this problem has not occurred.

Zookeeper 3.5.x branch adds discovery functionality and does automated clustering. It’s great, but from what I understand is still in alpha.

Prior to the 3.5.x branch I know of no way to discover what nodes are actually in the ensemble. The 4 letter commands will tell you whether a node is in an ensemble, whether it is a leader or follower, but it will not tell you what ensemble it is in or list any other node information. If someone has a way to do this please post, because I have looked all over.

We make use of Scalr and that adds an additional layer to automation. I run orchestration scripts in Scalr that discover the other running zookeeper nodes in (what Scalr calls) the same Farm Role. This script configures each node with the information for the other nodes and does a restart of Zookeeper to bring them into an ensemble. Then it collects this information and stores the IP addresses into a Global Variable in scalr that is available then to Solr. Changes to the ensemble are reflected in this variable that is then passed to the Solr cloud where a restart of the service will update the zookeeper information in Solr. We are working towards moving this functionality to Consul where it will register ther zookeeper ensemble information allowing Solr to pull it from Consul as opposed to relying on Global Variables. What I am getting at is that outside the 3.5.x branch, automating this takes a bit of work.


--
Daniel S Washko
Solutions Architect



[hidden email]  <http://www.gannett.com/>
       
On 7/11/17, 6:58 PM, "Luigi Tagliamonte" <[hidden email]> wrote:

    Hello, Zookeeper Users!
    I'm currently configuring/exploring zookeeper.
    I'm reading a lot about ensembles and scaling and I got some question that
    I'd like to submit to an expert audience.
    I need zookeeper as Kafka dependency so my deployment goal is the ensemble
    reliability especially because last Kafka version uses zookeeper only to
    store the leader partition.
   
    Here are my questions:
   
    - To manage the ensemble I decided to use exhibitor - what do you think
    about? Should I look to something else?
   
    - Is there a way to discover all the servers of an ensemble apart from
    use 4LTR? I wonder if it is possible to do something like in Cassandra were
    you contact one node and you can get the whole cluster info from it. should
    I configure just a DNS per zookeeper server, this doesn't scale well in a
    dynamic env like servers in autoscaling.
   
    - is there any white paper that shows a real scalable and reliable
    Zookeeper installation? Any resources are welcome!
   
    Thank you all in advance!
    Regards
   

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New to zookeeper

Alexander Shraer-2
Just a small comment - 3.5.3 is in beta. The getConfig API returns a list
of servers in the cluster, including their ports and roles in the ensemble.


Alex

On Wed, Jul 12, 2017 at 7:53 AM, Washko, Daniel <[hidden email]> wrote:

> I speak strictly from my experience with Zookeeper and not an any official
> capacity of the project or of exhibitor.
>
> Exhibitor works great and allows you to easily automate clustering
> zookeeper nodes into an ensemble and discovering the individual nodes in
> the ensemble via an http call. We ran into a problem, though, after we
> implemented Exhibitor across our infrastructure. Every so often our
> Zookeeper ensembles lost the data they stored. While I cannot say this was
> caused by Exhibitor, we have Solr clouds where Exhibitor was not used and
> they never had this problem. My suspicion is that there was a problem with
> a zookeeper node and Exhibitor removed that node from the ensemble then did
> a rolling restart. When that node recovered for some reason the data was
> corrupted or lost. Exhibitor pulled that node back into the ensemble and
> did a rolling restart. That node became leader and when the others joined
> synced from that. Those nodes then dumped their data stored to be in sync
> with the leader. This is my speculation, I have had a very hard time
> replicating this and have not heard of anyone else having this problem.
> Again, I am not definitively saying Exhibitor is the cause of this but
> since we removed Exhibitor this problem has not occurred.
>
> Zookeeper 3.5.x branch adds discovery functionality and does automated
> clustering. It’s great, but from what I understand is still in alpha.
>
> Prior to the 3.5.x branch I know of no way to discover what nodes are
> actually in the ensemble. The 4 letter commands will tell you whether a
> node is in an ensemble, whether it is a leader or follower, but it will not
> tell you what ensemble it is in or list any other node information. If
> someone has a way to do this please post, because I have looked all over.
>
> We make use of Scalr and that adds an additional layer to automation. I
> run orchestration scripts in Scalr that discover the other running
> zookeeper nodes in (what Scalr calls) the same Farm Role. This script
> configures each node with the information for the other nodes and does a
> restart of Zookeeper to bring them into an ensemble. Then it collects this
> information and stores the IP addresses into a Global Variable in scalr
> that is available then to Solr. Changes to the ensemble are reflected in
> this variable that is then passed to the Solr cloud where a restart of the
> service will update the zookeeper information in Solr. We are working
> towards moving this functionality to Consul where it will register ther
> zookeeper ensemble information allowing Solr to pull it from Consul as
> opposed to relying on Global Variables. What I am getting at is that
> outside the 3.5.x branch, automating this takes a bit of work.
>
>
> --
> Daniel S Washko
> Solutions Architect
>
>
>
> [hidden email]  <http://www.gannett.com/>
>
> On 7/11/17, 6:58 PM, "Luigi Tagliamonte" <[hidden email]>
> wrote:
>
>     Hello, Zookeeper Users!
>     I'm currently configuring/exploring zookeeper.
>     I'm reading a lot about ensembles and scaling and I got some question
> that
>     I'd like to submit to an expert audience.
>     I need zookeeper as Kafka dependency so my deployment goal is the
> ensemble
>     reliability especially because last Kafka version uses zookeeper only
> to
>     store the leader partition.
>
>     Here are my questions:
>
>     - To manage the ensemble I decided to use exhibitor - what do you think
>     about? Should I look to something else?
>
>     - Is there a way to discover all the servers of an ensemble apart from
>     use 4LTR? I wonder if it is possible to do something like in Cassandra
> were
>     you contact one node and you can get the whole cluster info from it.
> should
>     I configure just a DNS per zookeeper server, this doesn't scale well
> in a
>     dynamic env like servers in autoscaling.
>
>     - is there any white paper that shows a real scalable and reliable
>     Zookeeper installation? Any resources are welcome!
>
>     Thank you all in advance!
>     Regards
>
>
>
Loading...