Quantcast

Can ZK be used for my use case?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Can ZK be used for my use case?

Tavi
Hi everyone,

I have a web application who generates different types of files (at different version) on the server.  Those files must be transferred at 300 clients (watchers) and, after transfer, each client must modify every file for his purpose.  The "client" is a simple java stand alone application installed on my user PC.
Today my users must download manually their files from an FTP server and they must use a java library to convert their files.  Of course, I don't know if they do this when the file version changes, I don't have any trace who did it ...

My idea is to use a ZK server with 300 clients, each client will be monitoring a specific namespace for changes (for example : client 1 will monitor /app1 and app2, client 2 -> app 2 only ...).  Every namespace will contain the name of the file, his location, his version ... When the file version will change, the java client will use the information to retrieve and to modify the file from my central FTP. "My leader" needs to know which user is connected and he must synchronize the downloads to avoid network bottlenecks.

Does ZK fits in this scenario?

Thanks for your time,
  Tavi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ZK be used for my use case?

Martin Kou
How large are your files? ZooKeeper is generally not designed for large size storage. Also, it doesn't provide guarantees of watchers being called when network outage is involved.

Sent from my iPhone

> On Sep 5, 2013, at 7:36 AM, Tavi <[hidden email]> wrote:
>
> Hi everyone,
>
> I have a web application who generates different types of files (at
> different version) on the server.  Those files must be transferred at 300
> clients (watchers) and, after transfer, each client must modify every file
> for his purpose.  The "client" is a simple java stand alone application
> installed on my user PC.
> Today my users must download manually their files from an FTP server and
> they must use a java library to convert their files.  Of course, I don't
> know if they do this when the file version changes, I don't have any trace
> who did it ...
>
> My idea is to use a ZK server with 300 clients, each client will be
> monitoring a specific namespace for changes (for example : client 1 will
> monitor /app1 and app2, client 2 -> app 2 only ...).  Every namespace will
> contain the name of the file, his location, his version ... When the file
> version will change, the java client will use the information to retrieve
> and to modify the file from my central FTP. "My leader" needs to know which
> user is connected and he must synchronize the downloads to avoid network
> bottlenecks.
>
> Does ZK fits in this scenario?
>
> Thanks for your time,
>  Tavi
>
>
>
>
> --
> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Can-ZK-be-used-for-my-use-case-tp7579049.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Can ZK be used for my use case?

Rakesh R
Hi Tavi,

If I understood the usecase correctly, you have different types of files which is present on a FTP server. There are 300 clients which would be interested on the file modifications and watching these files, when there is a version change they would access those files and will act upon.

If this is the case, you can create set of zNodes which will be representing each files like /parentNode/file1, /parentNode/file2, /parentNode/file3... etc.
Interested Clients can add DataWatchers to these files and write your client side logic on recieving the watches notification.
For example,
Client1 -> /parentNode/file1, /parentNode/file2
Client2 -> /parentNode/file2, /parentNode/file3
Client3 -> /parentNode/file4, /parentNode/file1

Say Client1 wants to modify /parentNode/file2, first he should acquire a lock(please see distributed lock recipe using zookeeper) on the file and after modification should add metadata (host:port, or any other unique key to get the trace, who has done the changes) on the zNode /parentNode/file2 and release the lock. After updating the metadata of /parentNode/file2, Client1 and Client3 would get the data watches notification and can act accordingly.

All your clients can add child watches to the /parentNode to know the file addition/creation on the FTPServer and decide whether to add DataWatchers or not.

Always keep this in mind: Default zNode data size is 1MB and recommended to keep lesser data on zNode(its tunable/configurable parameter, user can decide).
ZooKeeper is designed as a high read, high throughput system for small data. It is not designed as a large data store to hold very large data values. As such this 1MB value is a default config option and can be overridden.  It is not advised to do so - but increasing the size a little bit will probably not damage your system (it all depends on your unique access patterns and these changes should be made with care and at your own risk).

Also, network fluctuations could affect the watch notifications, there could be high chance of missing watch notification when it involves network fluctuations and should have better handling of ZooKeeper connection events.



BTW, How many files would be present in the FTPServer?.  Is 300 clients fixed always or dynamically grows?

What if, the client missed one of the version change notification and would like to know the frequency of changing the same file again and again?

-Rakesh

-----Original Message-----
From: Martin Kou [mailto:[hidden email]]
Sent: 06 September 2013 09:23
To: [hidden email]
Cc: [hidden email]
Subject: Re: Can ZK be used for my use case?

How large are your files? ZooKeeper is generally not designed for large size storage. Also, it doesn't provide guarantees of watchers being called when network outage is involved.

Sent from my iPhone

> On Sep 5, 2013, at 7:36 AM, Tavi <[hidden email]> wrote:
>
> Hi everyone,
>
> I have a web application who generates different types of files (at
> different version) on the server.  Those files must be transferred at
> 300 clients (watchers) and, after transfer, each client must modify
> every file for his purpose.  The "client" is a simple java stand alone
> application installed on my user PC.
> Today my users must download manually their files from an FTP server
> and they must use a java library to convert their files.  Of course, I
> don't know if they do this when the file version changes, I don't have
> any trace who did it ...
>
> My idea is to use a ZK server with 300 clients, each client will be
> monitoring a specific namespace for changes (for example : client 1
> will monitor /app1 and app2, client 2 -> app 2 only ...).  Every
> namespace will contain the name of the file, his location, his version
> ... When the file version will change, the java client will use the
> information to retrieve and to modify the file from my central FTP.
> "My leader" needs to know which user is connected and he must
> synchronize the downloads to avoid network bottlenecks.
>
> Does ZK fits in this scenario?
>
> Thanks for your time,
>  Tavi
>
>
>
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Can-ZK-be-used-for-my-use-c
> ase-tp7579049.html Sent from the zookeeper-user mailing list archive
> at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Can ZK be used for my use case?

Tavi
Hi,

First of all I want to thank you for your answers.

I understand the limits of the nodes sizes and I don't intend to use ZK to transfer my data, which, by the way, can achieve a few gigabytes.

I need something to act as an orchestration chef (Coordination System), to announce a new version to all my actives clients (the leader will inform his clients about the version number and, maybe, some brief information about the files to be downloaded and handled ... ).

My first idea was to create an central Web Service who will provide my updates information and who will receive a confirmation from the client who was finished the treatment. But all the logic that I needed seems to be included in ZK core (communication, group handling, authentication .... ).
On my FTP server I can have around 40k files. A full transfer can be done in max 30 minutes.

My number of clients can change dynamically but the leader will know in advance if a new client is installed or is deleted. In fact I was thinking that the client will be a java daemon manually installed on the client side, configured to connect at my main ZK server to watch a dedicated node where the leader will store his specific update information.

The network fluctuations are a real problem, but I saw that the ZK-API provide the logic to handle this situation.

Regarding the frequency of changing,  once a day a new version will be created on my central FTP site, but usually only a few files will be changed (max. 2MB each). Once a month a big file must (1 to 10 GB) be actualized on the clients side.
If the client miss a modification, this can be a problem ....

In fact Rakesh gave me an idea :    
- First, the leader will create the "hierarchical namespace", something like
/application_1/A_group/client_a1
/application_1/A_group/client_a2
      ....
/application_x/y_group/client_yn
/application_x/versions/1
/application_x/versions/2

where "client_yn" is a dedicated node for a single client.
- each node will provide a file (XML, json or other), let's say "status.data.xml"
- when a new version is created, the leader will update the "version number" in every "status.data.xml" and it will create a new node in  "/application_x/versions/N" having another information file with the summary of changes (names of the ftp files to be added, replaces or to be deleted on the client).
- when a client start, it will read his file content ("status.data.xml") it will compare the latest version number with the one saved locally, and, if necessary, it will access  the "/application_x/versions/N" repository to get the all the information  about the version to be handled. Every each treatment he will write back in his "status.data.xml" the time when the process of conversion was finished. At the end he will add a watch to his "status.data.xml" data file.

Is this scenario a valid one, or I misunderstood the use of ZK?

Thanks again,
    Tavi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ZK be used for my use case?

Edward Ribeiro
You understand the use of ZK very well, imo.

I'd change a thing or two in your client--->ZK operation, but the idea
would still be the same. Like, for example, why give the leader the task of
creating the
/application_1/A_group/client_a1
 znode when each client can create its own "client_a1" znode, if it
already doesn't exist? But see what fits best your use case.

Regards,
Edward




On Thu, Sep 19, 2013 at 4:44 PM, Tavi <[hidden email]> wrote:

> Hi,
>
> First of all I want to thank you for your answers.
>
> I understand the limits of the nodes sizes and I don't intend to use ZK to
> transfer my data, which, by the way, can achieve a few gigabytes.
>
> I need something to act as an orchestration chef (Coordination System), to
> announce a new version to all my actives clients (the leader will inform
> his
> clients about the version number and, maybe, some brief information about
> the files to be downloaded and handled ... ).
>
> My first idea was to create an central Web Service who will provide my
> updates information and who will receive a confirmation from the client who
> was finished the treatment. But all the logic that I needed seems to be
> included in ZK core (communication, group handling, authentication .... ).
> On my FTP server I can have around 40k files. A full transfer can be done
> in
> max 30 minutes.
>
> My number of clients can change dynamically but the leader will know in
> advance if a new client is installed or is deleted. In fact I was thinking
> that the client will be a java daemon manually installed on the client
> side,
> configured to connect at my main ZK server to watch a dedicated node where
> the leader will store his specific update information.
>
> The network fluctuations are a real problem, but I saw that the ZK-API
> provide the logic to handle this situation.
>
> Regarding the frequency of changing,  once a day a new version will be
> created on my central FTP site, but usually only a few files will be
> changed
> (max. 2MB each). Once a month a big file must (1 to 10 GB) be actualized on
> the clients side.
> If the client miss a modification, this can be a problem ....
>
> In fact Rakesh gave me an idea :
> - First, the leader will create the "hierarchical namespace", something
> like
> /application_1/A_group/client_a1
> /application_1/A_group/client_a2
>       ....
> /application_x/y_group/client_yn
> /application_x/versions/1
> /application_x/versions/2
>
> where "client_yn" is a dedicated node for a single client.
> - each node will provide a file (XML, json or other), let's say
> "status.data.xml"
> - when a new version is created, the leader will update the "version
> number"
> in every "status.data.xml" and it will create a new node in
> "/application_x/versions/N" having another information file with the
> summary
> of changes (names of the ftp files to be added, replaces or to be deleted
> on
> the client).
> - when a client start, it will read his file content ("status.data.xml") it
> will compare the latest version number with the one saved locally, and, if
> necessary, it will access  the "/application_x/versions/N" repository to
> get
> the all the information  about the version to be handled. Every each
> treatment he will write back in his "status.data.xml" the time when the
> process of conversion was finished. At the end he will add a watch to his
> "status.data.xml" data file.
>
> Is this scenario a valid one, or I misunderstood the use of ZK?
>
> Thanks again,
>     Tavi
>
>
>
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Can-ZK-be-used-for-my-use-case-tp7579049p7579121.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Can ZK be used for my use case?

Rakesh R
In reply to this post by Tavi
Hi Tavi,


Yeah, I hope you got grip in ZooKeeper features :) From birds view ZooKeeper is suited for your requirement.

>>>>>>> The network fluctuations are a real problem, but I saw that the ZK-API provide the logic to handle this situation.
Yeah you are correct. Also would be helpful, if you can see more about : handling KeeperException and different KeeperState transitions.

>>>>/application_x/y_group/client_yn
/application_x/versions/1
/application_x/versions/2

Here you can also think of logic to do purging(I mean removing old ones) the version znodes, Otw it would grows infinitely no?


>>>>>>when a client start, it will read his file content ("status.data.xml") it will compare the latest version number with the one saved locally, and, if necessary, it will access  the "/application_x/versions/N" repository to get the all the information  about the version to be handled. Every each treatment he will write back in his "status.data.xml" the time when the process of conversion was finished. At the end he will add a watch to his "status.data.xml" data file.

From your explanation, 'status.data.xml' is used to keep the file information. Am I correct?
BTW, I didn't fully understood the concept of this file, and who is managing/updating the 'status.data.xml' file?


-Rakesh

-----Original Message-----
From: Tavi [mailto:[hidden email]]
Sent: 20 September 2013 01:14
To: [hidden email]
Subject: RE: Can ZK be used for my use case?

Hi,

First of all I want to thank you for your answers.

I understand the limits of the nodes sizes and I don't intend to use ZK to transfer my data, which, by the way, can achieve a few gigabytes.

I need something to act as an orchestration chef (Coordination System), to announce a new version to all my actives clients (the leader will inform his clients about the version number and, maybe, some brief information about the files to be downloaded and handled ... ).

My first idea was to create an central Web Service who will provide my updates information and who will receive a confirmation from the client who was finished the treatment. But all the logic that I needed seems to be included in ZK core (communication, group handling, authentication .... ).
On my FTP server I can have around 40k files. A full transfer can be done in max 30 minutes.

My number of clients can change dynamically but the leader will know in advance if a new client is installed or is deleted. In fact I was thinking that the client will be a java daemon manually installed on the client side, configured to connect at my main ZK server to watch a dedicated node where the leader will store his specific update information.

The network fluctuations are a real problem, but I saw that the ZK-API provide the logic to handle this situation.

Regarding the frequency of changing,  once a day a new version will be created on my central FTP site, but usually only a few files will be changed (max. 2MB each). Once a month a big file must (1 to 10 GB) be actualized on the clients side.
If the client miss a modification, this can be a problem ....

In fact Rakesh gave me an idea :    
- First, the leader will create the "hierarchical namespace", something like
/application_1/A_group/client_a1
/application_1/A_group/client_a2
      ....
/application_x/y_group/client_yn
/application_x/versions/1
/application_x/versions/2

where "client_yn" is a dedicated node for a single client.
- each node will provide a file (XML, json or other), let's say "status.data.xml"
- when a new version is created, the leader will update the "version number"
in every "status.data.xml" and it will create a new node in "/application_x/versions/N" having another information file with the summary of changes (names of the ftp files to be added, replaces or to be deleted on the client).
- when a client start, it will read his file content ("status.data.xml") it will compare the latest version number with the one saved locally, and, if necessary, it will access  the "/application_x/versions/N" repository to get the all the information  about the version to be handled. Every each treatment he will write back in his "status.data.xml" the time when the process of conversion was finished. At the end he will add a watch to his "status.data.xml" data file.

Is this scenario a valid one, or I misunderstood the use of ZK?

Thanks again,
    Tavi




--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Can-ZK-be-used-for-my-use-case-tp7579049p7579121.html
Sent from the zookeeper-user mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Can ZK be used for my use case?

Tavi
Hi everyone, :)

I will try to respond at your questions :

" status.data.xml... the concept of this file ..."

- it will contains specific information's which is applied for a specific client:
        - current version number (ex: 2.0.12)
        - a few specific  variables, which can be changed by the leader, used by the client when he will process his downloaded data from the FTP (ex: region number: 24, const_1: 3.4234,  const_2: 3% ... and so on). Theirs values are not version specific.
        - After each successful treatment on the client side, the client will write back, in the same file the date when a specific version was treated (see <updates_status>).

        <current_version release_date="2014-06-01">2.0.12</current_version >       
        < variables>
                <item name="region">24</item>
                <item name="const_1">3.4234</item>
                <item name="const_2">3%</item>
                           ..........
              </variables >
        <updates_status>
                <version nb="2.0.12" status="ok" date="2014-07-05" />       
                <version nb="2.0.12" status="err" date="2014-06-05">
                        <errMsg> ... Some FTP Exception ...</errMsg>
                </version>
                <version nb="2.0.11" status="ok" date="2014-05-05" />
                <version nb="2.0.10" status="ok" date="2014-03-05" />
        </updates_status>

The leader will collect all the " updates_status" information, from each client and he will transfer it in a database to be handled by a main "application manager".  

"why give the leader the task of creating the /application_1/A_group/client_a1 "
The leader is my  "Orchestra Conductor" he must decide when a client can be activated (a new node is created), or when it is disables (the node is deleted) ...

Of course, I will have some logic to purge all unnecessary information or nodes, for the moment I'm in a preliminary analysis so I'm a little bit lower that a "bird view"  :).

I really appreciate your time,

My best regards,
   Tavi
Loading...