Our thanks to Jordan Zimmerman, software engineer at Netflix, for the guest post below about the recently announced Apache Curator (incubating) project.
Apache ZooKeeper (zookeeper.apache.org) is a client/server system for distributed coordination. On the client side, you use the client library (from Java, C/C++, etc.) to connect to the server. The client library exposes APIs that resemble a simple filesystem. You create/read/update/delete ZNodes via the API.
The ZooKeeper documentation describes high-level recipes that can be built using this API. However, someone new to ZooKeeper will quickly discover a steep learning curve where they have to:
- Manually deal with connection issues
- Handle “recoverable” errors as described in the ZooKeeper documentation
- Implement any needed “recipes”
- Learn and understand numerous undocumented ZooKeeper edge cases and best practices
At Netflix, there was a lot of interest in ZooKeeper but not a lot of experience. A few trials had been done but none of them made it out of the testing sandbox. For that reason, Curator was initially conceived as a way to make ZooKeeper easier to use for non-experts. The original versions of what would become Curator consisted of a small library meant for internal use, but it quickly became clear that it would be useful to others outside Netflix.
Curator’s main benefits are:
- A simplified API
- Automatic ZooKeeper connection management with retries
- Complete, well-tested implementations of ZooKeeper recipes
- A framework that makes writing new ZooKeeper recipes much easier
The creation of Curator also coincided with a desire to build a Netflix OSS presence, so, in 2011, Curator was released as an open source project on Github. It quickly became the de-facto way for Java programmers to use ZooKeeper, and as the Curator community grew, Netflix realized that Curator might be better suited to becoming an Apache project — thus, Curator is now in the Apache Incubator, with Cloudera’s Patrick Hunt serving as its champion.
New users of ZooKeeper are surprised to learn that a significant amount of connection management must be done manually. For example, when the ZooKeeper client connects to the ensemble it must negotiate a new session and so on. This takes some time. If you use a ZooKeeper client API before the connection process is complete, ZooKeeper will throw an exception. These types of exceptions are referred to as “recoverable” errors.
Curator automatically handles connection management for you, greatly simplifying client code. Instead of directly using the ZooKeeper APIs you use Curator APIs that internally check for connection completion and wrap each ZooKeeper API in a retry loop. Curator uses a retry mechanism to handle recoverable errors and automatically retry operations. The method of retry is customizable. Curator comes bundled with several implementations (
ExponentialBackoffRetry, etc.) or you can write your own.
The ZooKeeper documentation describes many possible uses for ZooKeeper calling each a “recipe”. While the distribution comes bundled with a few implementations of these recipes, most ZooKeeper users will need to manually implement one or more of the recipes.
Implementing a ZooKeeper recipe is not trivial. Besides the connection handling issues mentioned earlier, there are numerous edge cases that are not well documented. For example, many recipes require that an ephemeral-sequential node be created. New users of ZooKeeper will not know that there is an edge case in ephemeral-sequential node creation that requires you to put a special “marker” in the node’s name so that you can search for the created node if an I/O failure occurs. This is but one of many edge cases that are not yet well documented.
We hope that the Curator community will continue to grow and make using ZooKeeper more productive and more enjoyable. We invite you to participate and contribute via curator.incubator.apache.org.