The guest post below was originally authored by Pinterest engineer Raghavendra Prabhu and published by the Pinterest Engineering blog. Being big ZooKeeper fans, we re-publish it here for your convenience. Thanks, Pinterest!
Apache ZooKeeper is an open source distributed coordination service that’s popular for use cases like service discovery, dynamic configuration management and distributed locking. While it’s versatile and useful, it has failure modes that can be hard to prepare for and recover from,
Flavio Junqueira (PMC Chair of the Apache ZooKeeper project and a member of the Systems and Networking Group at Microsoft Research) and Benjamin Reed (PMC Member and Software Engineer at Facebook) are the co-authors of the new O’Reilly Media book ZooKeeper: Distributed Process Coordination. We had a chat with Flavio and Ben recently about the rationale for writing the book, and what it will add to the distributed systems conversation.
Apache ZooKeeper is a client/server system for distributed coordination that exposes an interface similar to a filesystem, where each node (called a znode) may contain data and a set of children. Each znode has a name and can be identified using a filesystem-like path (for example, /root-znode/sub-znode/my-znode).
In Apache HBase, ZooKeeper coordinates, communicates, and shares state between the Masters and RegionServers. HBase has a design policy of using ZooKeeper only for transient data (that is,
Our thanks to Jordan Zimmerman, software engineer at Netflix, for the guest post below about the recently announced Apache Curator (incubating) project.
Apache ZooKeeper (zookeeper.apache.org) is a client/server system for distributed coordination. On the client side, you use the client library (from Java, C/C++, etc.) to connect to the server. The client library exposes APIs that resemble a simple filesystem.
It’s widely accepted that you should never design or implement your own cryptographic algorithms but rather use well-tested, peer-reviewed libraries instead. The same can be said of distributed systems: Making up your own protocols for coordinating a cluster will almost certainly result in frustration and failure.
Architecting a distributed system is not a trivial problem; it is very prone to race conditions, deadlocks, and inconsistency. Making cluster coordination fast and scalable is just as hard as making it reliable.