Apache HBase junkies, this one’s for you: I had an opportunity recently for a quick chat with the authors of HBase in Action (Manning Publications – download sample chapter PDF), by Nick Dimiduk and Cloudera’s Amandeep Khurana.
Why did you write HBase in Action?
Amandeep: HBase In Action is about how to use Apache HBase effectively. When we started talking about this topic initially, Nick and I scoffed at the idea of writing an entire 300-page book. After all, it really is just three API calls: Get, Put, and Scan. But as we talked further, we realized that building successful applications using HBase (or for that matter other NoSQL stores) is not as trivial as the API itself, especially for people with relational DBMS background. There was much unlearning to be done and several new concepts to be learned about thinking at scale.
As we started to think about this more, we also noticed that until very recently, the focus in the HBase community was on making HBase a more solid, performant, and stable data store. Most people using it in production know the internals well enough to understand what to do and how to do it. But it also became clear that adoption is being inhibited by the fact that considerable time is needed to figure out how to use HBase effectively, and there is little content available about that.
Nick: Amandeep nailed this one. This work is designed to be the “HBase User’s Guide” or “operator’s manual” rather than a “mechanic’s guide.” Continuing the automobile analogy: We’ll teach you how to choose the right car for your needs, the basic rules of the road, how to deal with the potholes you’ll encounter along the way, how to change your oil, and when to go for a tuneup. We don’t get into rebuilding the engine block or detailed wiring diagrams; those we leave for those interested in diving into the code and engaging with the committers.
Who is your intended reader?
Amandeep: The intention of the book is to teach developers how to build applications using HBase and thereafter deploy and operationalize them. The book is geared toward folks who have some development background and have likely built applications using other databases.
During your research, did you learn anything about HBase you didn’t know?
Amandeep: There were several things (including some that were unrelated to HBase) I learned during this project. The first example is schema design – I had used most of the concepts at different times but never thought about how they all worked together, so creating a simple example for the book was a little challenging. Coprocessors and asynchbase were the other two topics that I really didn’t know a whole lot about until they made their way into the manuscript.
Nick: The big take-away for me was the actual depth of my knowledge. Everything I wrote about forced me to explore the topic even deeper than previously. This was the case regardless of how well I thought I knew it going in. I also very much enjoyed getting into the internals of OpenTSDB. There’s not many examples of HBase applications out there for people to explore. Be it literature or code or application architecture, spend time reading and it will teach you to be a better writer.
Some readers will be surprised to learn that HBase offers alternative clients for non-Java development (via the Thrift gateway; see sample chapter). How important is that fact for HBase adoption?
Amandeep: To date the primary client for HBase development has been the native Java client. However, having a single client out there limits adoption. There is not enough documentation and guidance available out there about other clients. That limits the adoption and therefore inhibits the investment that HBase developers put into them. With alternate clients becoming more accessible, more dev time would go into them, making them as good as the native client in the long run. This could be key for driving adoption.
What’s the most interesting application of HBase you’ve ever seen?
Amandeep: In my mind the really interesting applications are where HBase is serving real-time traffic with tight SLAs. Those are not only harder to develop but also hard to operationalize and tune. That’s where the challenge lies, and that’s where the opportunity lies too.
Nick: I’m most excited about the work my team at Climate is doing with HBase now and even more excited about what we’ll be doing with it next year. Chapter 8 is a simplified version of one aspect of my current project: enabling GIS applications on HBase. The next project will be hosting huge scientific datasets and exposing them using community standard tools and protocols; I’m optimistic we’ll choose HBase to tackle this challenge as well.
You can also meet Amandeep and Nick at Strata Conference + Hadoop World, Oct. 23-25, in New York. Cloudera will be giving away 200 copies at our booth as part of the “Meet the Author” program. For an HBase deep dive there, catch Amandeep’s “Using HBase” session. (Use discount code CLOUDERA to get an additional 20% off conference registration!)