HBase Training: Demystifying Real-Time Big Data Storage

We at Cloudera University have been busy lately, building and expanding our courses to help data professionals succeed. We’ve expanded the Hadoop Administrator course and created a new Data Analyst course. Now we’ve updated and relaunched our course on Apache HBase to help more organizations adopt Hadoop’s real-time Big Data store as a competitive advantage.

The course is designed to make sure developers and administrators with an HBase use case can start realizing value from day one. We doubled the length of the curriculum to four days, allowing a deep dive into HBase operations as well as development.

As the primary course author, I had the pleasure of interviewing some of the most notable members of the HBase community. People like Michael Stack, Lars George, and Amandeep Khurana have written the books, contributed code, and deployed and supported huge clusters in production. I also tried to capture many of the key insights that otherwise only exist in HBase’s tribal knowledge, some of which I discuss in my recent blog posts on the REST Interface and the Thrift Interface, as well as in the Simple User Access chapter of the Apache HBase Reference Guide.

Beyond the Tribe

The main theme of the four-day course is that effective HBase requires an understanding of both programming and operations topics, typically constituting DevOps, but here positioned for a variety of roles and goals. With our core audiences of developers and administrators in mind, we built the course with the objective of every participant both learning from and contributing to the learning of people from other jobs. Developers will learn how their code can affect operations. Administrators will learn common code and architecture issues that may influence their strategies. Business intelligence analysts and quality assurance engineers will learn how to interact with HBase and what challenges to anticipate.

To use HBase effectively, you need to understand programming as well as operations topics.

Adding two more days to the HBase course allowed us to increase the number of hands-on exercises and ramp up the focus on real-world scenarios culled from customer engagements.  

The curriculum dives deep into using the HBase API and coding with both Java and Python, including hints and solutions for each language in the exercises. Non-developers can also look through these examples to see how the HBase API works.

The administration-focused sections show how to add HBase to an existing cluster, covering both the installation and configuration. HBase configuration has long been considered a black art, but the insights we’ve captured from our experts in the field and writing the code cast new light on the methods and best practices required for success.

Real-Time Access for the Real World

We strive to make the learnings from the course as relevant as possible and cover some of the other ecosystem projects that work with or augment HBase. One exercise explores Apache Hive, which is a great tool for developers and analysts to access HBase using a SQL-like syntax instead of writing code. We also cover the Kiji Project and how it makes developing real-time Big Data applications with HBase easier.

Finally, we cover designing solutions with HBase. These chapters bring together and activate the lessons about HBase architecture and API programming from earlier in the course. Engineering an HBase solution is different from working with an RDBMS. This chapter covers how to develop and execute design according to particular access patterns and performance requirements.

If you want to learn HBase, are evaluating or starting an HBase project, or are standing up an HBase cluster, I highly recommend you take this course and start working towards the CCSHB HBase certification. We’ve taken the full HBase knowledge base, condensed it down to the most useful insights and best practices, and made it relevant for data professionals from any type of organization. You can even start by watching our recent Introduction to Apache HBase Training webinar to get a taste of the course content, understand the audience and prerequisites, and grab a promotion code for a discount on the live course.

Jesse Anderson is a Curriculum Developer and Instructor at Cloudera.

Filed under:

No Responses

Leave a comment


− eight = 0