Cloudera Engineering Blog · HBase Posts

New in CDH 5.2: Improvements for Running Multiple Workloads on a Single HBase Cluster

These new Apache HBase features in CDH 5.2 make multi-tenant environments easier to manage.

Historically, Apache HBase treats all tables, users, and workloads with equal weight. This approach is sufficient for a single workload, but when multiple users and multiple workloads were applied on the same cluster or table, conflicts can arise. Fortunately, starting with HBase in CDH 5.2 (HBase 0.98 + backports), workloads and users can now be prioritized.

Tuning Java Garbage Collection for HBase

This guest post from Intel Java performance architect Eric Kaczmarek (originally published here) explores how to tune Java garbage collection (GC) for Apache HBase focusing on 100% YCSB reads.

Apache HBase is an Apache open source project offering NoSQL data storage. Often used together with HDFS, HBase is widely used across the world. Well-known users include Facebook, Twitter, Yahoo, and more. From the developer’s perspective, HBase is a “distributed, versioned, non-relational database modeled after Google’s Bigtable, a distributed storage system for structured data”. HBase can easily handle very high throughput by either scaling up (i.e., deployment on a larger server) or scaling out (i.e., deployment on more servers).

NoSQL in a Hadoop World

The number of powerful data query tools in the Apache Hadoop ecosystem can be confusing, but understanding a few simple things about your needs usually makes the choice easy. 

Ah, the good old days. I recall vividly that in 2007, I was faced to store 1 billion XML documents and make them accessible as well as searchable. I had few choices on a given shoestring budget: build something one my own (it was the rage back then—and still is), use an existing open source database like PostgreSQL or MySQL, or try this thing that Google built successfully and that was now implemented in open source under the Apache umbrella: Hadoop.

Project Rhino Goal: At-Rest Encryption for Apache Hadoop

An update on community efforts to bring at-rest encryption to HDFS — a major theme of Project Rhino.

Encryption is a key requirement for many privacy and security-sensitive industries, including healthcare (HIPAA regulations), card payments (PCI DSS regulations), and the US government (FISMA regulations).

How-to: Use Kite SDK to Easily Store and Configure Data in Apache Hadoop

Organizing your data inside Hadoop doesn’t have to be hard — Kite SDK helps you try out new data configurations quickly in either HDFS or HBase.

Kite SDK is a Cloudera-sponsored open source project that makes it easier for you to build applications on top of Apache Hadoop. Its premise is that you shouldn’t need to know how Hadoop works to build your application on it, even though that’s an unfortunately common requirement today (because the Hadoop APIs are low-level; all you get is a filesystem and whatever else you can dream up — well, code up).

HBaseCon 2014 is a Wrap!

HBaseCon 2014 is in the books. Thanks to all attendees, speakers, and sponsors!

HBaseCon 2014, much like a butterfly, lived for a short number of hours on Monday — but it certainly was beautiful while it lasted! (See photos here.)

How-to: Extend Cloudera Manager with Custom Service Descriptors

Thanks to Jonathan Natkins of WibiData for the post below about how his company extended Cloudera Manager to manage Kiji. Learn more about Kiji and the organizations using it to build real-time HBase applications at Kiji Sessions, happening on May 6, 2014, the day after HBaseCon.

As a partner of Cloudera, WibiData sees Cloudera Manager’s new extensibility framework as one of the most exciting parts of Cloudera Enterprise 5. Cloudera Manager 5.0.0 provides the single-pane view that Apache Hadoop administrators and operators want to effectively manage a cluster of machines. Additionally, Cloudera Manager now offers tight integration for partners to plug into the CDH ecosystem, which benefits Cloudera as well as WibiData.

Sneak Preview: "Case Studies" Track at HBaseCon 2014

The HBaseCon 2014 “Case Studies” track surfaces some of the most interesting (and diverse) use cases in the HBase ecosystem — and in the world of NoSQL overall — today.

The HBaseCon 2014 (May 5, 2014 in San Francisco) is not just about internals and best practices — it’s also a place to explore use cases that you not have even considered before.

Sneak Preview: "Ecosystem" Track at HBaseCon 2014

The HBaseCon 2014 “Ecosystem” track offers a cross-section view of the most interesting projects emerging on top of, or alongside, HBase.

The HBaseCon 2014 (May 5, 2014 in San Francisco) is not just a reflection of HBase itself — it’s also a celebration of the entire ecosystem. Thanks again, Program Committee!

Sneak Preview: "Features & Internals" Track at HBaseCon 2014

The HBaseCon 2014 “Features & Internals” track covers the newest developments in Apache HBase functionality.

The HBaseCon 2014 (May 5, 2014 in San Francisco) agenda has something for everyone – particularly, developers building apps on HBase. Thanks again, Program Committee!

Newer Posts Older Posts