Cloudera Engineering Blog · HBase Posts

How-to: Scan Salted Apache HBase Tables with Region-Specific Key Ranges in MapReduce

Thanks to Pengyu Wang, software developer at FINRA, for permission to republish this post.

Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date, etc).

Salting Row Keys

Inside Apache HBase’s New Support for MOBs

Learn about the design decisions behind HBase’s new support for MOBs.

Apache HBase is a distributed, scalable, performant, consistent key value database that can store a variety of binary data types. It excels at storing many relatively small values (<10K), and providing low-latency reads and writes.

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment.

The Apache Hadoop ecosystem has become a preferred platform for enterprises seeking to process and understand large-scale data in real time. Technologies like Apache Kafka, Apache Flume, Apache Spark, Apache Storm, and Apache Samza are increasingly pushing the envelope on what is possible. It is often tempting to bucket large-scale streaming use cases together but in reality they tend to break down into a few different architectural patterns, with different components of the ecosystem better suited for different problems.

New in CDH 5.4: Apache HBase Request Throttling

The following post about the new request throttling feature in HBase 1.1 (now shipping in CDH 5.4) originally published in the ASF blog. We re-publish it here for your convenience.

Running multiple workloads on HBase has always been challenging, especially  when trying to execute real-time workloads while concurrently running analytical jobs. One possible way to address this issue is to throttle analytical MR jobs so that real-time workloads are less affected.

Scan Improvements in Apache HBase 1.1.0

The following post, from Cloudera intern Jonathan Lawlor, originally appeared in the Apache Software Foundation’s blog.

Over the past few months there have a been a variety of nice changes made to scanners in Apache HBase. This post focuses on two such changes, namely RPC chunking (HBASE-11544) and scanner heartbeat messages (HBASE-13090). Both of these changes address long standing issues in the client-server scan protocol. Specifically, RPC chunking deals with how a server handles the scanning of very large rows and scanner heartbeat messages allow scan operations to progress even when aggressive server-side filtering makes infrequent result returns.

Background

Apache Phoenix Joins Cloudera Labs

We are happy to announce the inclusion of Apache Phoenix in Cloudera Labs.

Apache Phoenix is an efficient SQL skin for Apache HBase that has created a lot of buzz. Many companies are successfully using this technology, including Salesforce.com, where Phoenix first started.

Sneak Preview: HBaseCon 2015 Use Cases Track

This year’s HBaseCon Use Cases track includes war stories about some of the world’s best examples of running Apache HBase in production.

As a final sneak preview leading up to the show next week, in this post, I’ll give you a window into the HBaseCon 2015′s (May 7 in San Francisco) Use Cases track.

Sneak Preview: HBaseCon 2015 Ecosystem Track

This year’s HBaseCon Ecosystem track covers projects that are complementary to HBase (with a focus on SQL) such as Apache Phoenix, Apache Kylin, and Trafodion.

In this post, I’ll give you a window into the HBaseCon 2015′s (May 7 in San Francisco) Ecosystem track.

Sneak Preview: HBaseCon 2015 Development & Internals Track

This year’s HBaseCon Development & Internals track covers new features in HBase 1.0, what’s to come in 2.0, best practices for tuning, and more.

In this post, I’ll give you a window into the HBaseCon 2015′s (May 7 in San Francisco) Development & Internals track.

Sneak Preview: HBaseCon 2015 Operations Track

This year’s HBaseCon Operations track features some of the world’s largest and most impressive operators.

In this post, I’ll give you a window into the HBaseCon 2015′s (May 7 in San Francisco) Operations track.

Older Posts