The following is a guest post from Aaron Kimball, who was Cloudera’s first engineer and the creator of the Apache Sqoop project. He is the Founder and CTO at WibiData, a San Francisco-based company building big data applications.
Our team at WibiData has been developing applications on Hadoop since 2010 and we’ve helped many organizations transform how they use data by deploying Hadoop. HBase in particular has allowed companies of all types to drive their business using scalable, high performance storage. Organizations have started to leverage these capabilities for various big data applications, including targeted content, personalized recommendations, enhanced customer experience and social network analysis.
While building many of these applications, we have seen emerging tools, design patterns and best practices repeated across projects. One of the clear lessons learned is that Hadoop and HBase provide very low-level interfaces. Each large-scale application we have built on top of Hadoop has required a great deal of scaffolding and data management code. This repetitive programming is tedious, error-prone, and makes application interoperability more challenging in the long run.
Today, we are proud to announce the launch of the Kiji project (www.kiji.org), as well as the first Kiji component: KijiSchema. The Kiji project was developed to host a suite of open source components built on top of Apache HBase and Apache Hadoop, that makes it easier for developers to:
- Use HBase as a real-time data storage and serving layer for applications
- Maximize HBase performance using data management best practices
- Get started building data applications quickly with easy startup and configuration
Kiji is open source and licensed under the Apache 2.0 license. The Kiji project is modularized into separate components to simplify adoption and encourage clean separation of functionality. Our approach emphasizes interoperability with other systems, leveraging the open source HBase, Avro and MapReduce projects, enabling you to easily fit Kiji into your development process and applications.
KijiSchema: Schema Management for HBase
The first component within the Kiji project is KijiSchema, which provides layout and schema management on top of HBase. KijiSchema gives developers the ability to easily store both structured and unstructured data within HBase using Avro serialization. It supports a variety of rich schema features, including complex, compound data types, HBase column key and time-series indexing, as well cell-level evolving schemas that dynamically encode version information.
KijiSchema promotes the use of entity-centric data modeling, where all information about a given entity (user, mobile device, ad, product, etc.), including dimensional and transaction data, is encoded within the same row. This approach is particularly valuable for user-based analytics such as targeting, recommendations, and personalization.
BentoBox: Get Started Developing with Kiji and Hadoop Fast
To aid developers new to HBase and Hadoop, we are also providing the quickest, easiest deployment of HBase as part of a Kiji BentoBox, which will install and run a fully-functional HBase mini-cluster with KijiSchema on your machine in under 15 minutes. You do not need to have Hadoop or HBase installed to run the BentoBox. You can get a Kiji BentoBox and KijiSchema here.
We encourage current HBase developers to check out KijiSchema by downloading the source code from GitHub. Over the next several months, we will be releasing additional Kiji components focused on improving the usability and performance of HBase and Hadoop for application development. We also welcome outside contributors who would like to help support and develop Kiji. You can join our mailing lists and learn more about how to contribute at kiji.org.