What’s Next for HBase? Big Data Applications Using Frameworks Like Kiji

Categories: HBase Kite SDK

Michael Stack is the chair of the Apache HBase PMC and has been a committer and project “caretaker” since 2007. Stack is a Software Engineer at Cloudera.

Apache Hadoop and HBase have quickly become industry standards for storage and analysis of Big Data in the enterprise, yet as adoption spreads, new challenges and opportunities have emerged. Today, there is a large gap — a chasm, a gorge — between the nice application model your Big Data Application builder designed and the raw, byte-based APIs provided by HBase and Hadoop. Many Big Data players have invested a lot of time and energy in bridging this gap. Cloudera, where I work, is developing the Cloudera Development Kit (CDK). Kiji, an open source framework for building Big Data Applications, is another such thriving option. A lot of thought has gone into its design. More importantly, long experience building Big Data Applications on top of Hadoop and HBase has been baked into how it all works.

Kiji provides a model and set of libraries that help you get up and running quickly.

Kiji provides a model and a set of libraries that allow developers to get up and running quickly. Intuitive Java APIs and Kiji’s rich data model allow developers to build business logic and machine learning algorithms without having to worry about bytes, serialization, schema evolution, and lower-level aspects of the system. The Kiji framework is modularized into separate components to support a wide range of usage and encourage clean separation of functionality. Kiji’s main components include KijiSchema, KijiMR, KijiHive, KijiExpress, KijiREST, and KijiScoring. KijiSchema, for example, helps team members collaborate on long-lived Big Data management projects, and does away with common incompatibility issues, and helps developers build more integrated systems across the board. All of these components are available in a single download called a BentoBox.

Historically, Hadoop and HBase have been considered difficult platforms to develop for. At Cloudera, we have made Hadoop deployment seamless, allowing enterprises access to their Big Data. Going the next step, building applications that can make use of all this data, has usually been a lonely and trying endeavor with developers having to build it all themselves from scratch. This is where projects like Kiji can help.

I look forward to the day here building Big Data applications is “easy,” when developers do not have to concern themselves with serializations, schema evolution, nor ensuring that their design aligns with the underlying store because this is all handled for them in rich layers that sit atop raw Hadoop and HBase APIs. Kiji is a welcome step toward such a future.

If you are interested in learning more about Kiji, come to KijiCon (on June 14, the day after HBaseCon), sponsored by Cloudera and Opower. Attend a half-day Kiji training workshop OR participate in a meetup-style open forum to discuss potential use cases, the corresponding Kiji features, and some hands-on Kiji hacking. There will also be beer, food, and some killer Kiji stories.

Register for KijiCon here: http://kijicon.eventbrite.com