Before we decided on Apache ZooKeeper, our software architecture was designed in a way that every tenant had its infrastructure instance, together with the data processing service.
This approach resulted in very low scalability and poor resource usage efficiency, so we decided to design a new system with horizontal scalability as a must. The system would be made of multiple processing workers and a job scheduler to process all incoming messages and distribute transformed data to all tenant databases. Therefore, we also needed centralized configuration storage and a communication medium between the scheduler and the workers. That was where ZooKeeper came into play.
Consistency and Partition Tolerance – Two out of Three
When deciding how to handle a distributed system, there are three key characteristics to look for: consistency, availability and partition tolerance. In accordance with the CAP theorem, you can only choose two of the three components mentioned above. In our specific case, we decided upon the ZooKeeper service. From all the other options, e.g. Consul, it was singled out for its consistency and partition tolerance. What is more, ZooKeeper was our favorite one because of its maturity and stability.
Originally developed by Yahoo, ZooKeeper is described by the Apache Foundation as a centralized service for maintaining configuration information, naming, providing distributed synchronization and group services. This open-source technology was developed to simplify the implementation of advanced patterns in distributed systems. The crucial concepts that were relevant in our case are leader election and distributed lock patterns.
A Stable and Reliable Service with Low-Level API
One of the main characteristics of ZooKeeper is a simple programming interface. It supports only a few operations, such as create, delete, exit, get data, set data, get children and sync. This simple API is useful when building a service where you need to implement specific use cases. On the other hand, implementing simple patterns and taking care of all exceptions and possible errors can be very time-consuming due to the low-level nature of its API.
ZooKeeper can also be very slow and memory-consuming if you are using it as a regular key-value store with a large number of write operations, as we did. Of course, this gave our system admins a lot of headaches. Writes are sequential, so it does not perform very well when it comes to write-dominant workloads. However, ZooKeeper is still a very stable and reliable service. It is swift for read-dominant workloads because of concurrent reads: each client is connected to a different server, so reads are performed simultaneously.
Despite all the benefits of using ZooKeeper as a coordination service, one should consider some of the modern alternative solutions which have already implemented most commonly used patterns and have a high-level API. Taking into consideration all the pros and cons of using ZooKeeper, we hope to find a more modern yet still stable solution in the near future.
December 18, 2019
Continuosly building, testing, releasing and monitoring t-matix mobile apps