Tag Archives: log

How-to: Scan Salted Apache HBase Tables with Region-Specific Key Ranges in MapReduce

Categories: Guest HBase How-to

Thanks to Pengyu Wang, software developer at FINRA, for permission to republish this post.

Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date,

Read More

New in CDH 5.4: Sensitive Data Redaction

Categories: CDH Cloudera Manager Platform Security & Cybersecurity

The best data protection strategy is to remove sensitive information from everyplace it’s not needed.

Have you ever wondered what sort of “sensitive” information might wind up in Apache Hadoop log files? For example, if you’re storing credit card numbers inside HDFS, might they ever “leak” into a log file outside of HDFS? What about SQL queries? If you have a query like select * from table where creditcard = ‘1234-5678-9012-3456’,

Read More

Security, Hive-on-Spark, and Other Improvements in Apache Hive 1.2.0

Categories: Community Hive Spark

Apache Hive 1.2.0, although not a major release, contains significant improvements.

Recently, the Apache Hive community moved to a more frequent, incremental release schedule. So, a little while ago, we covered the Apache Hive 1.0.0 release and explained how it was renamed from 0.14.1 with only minor feature additions since 0.14.0.

Shortly thereafter, Apache Hive 1.1.0 was released (renamed from Apache Hive 0.15.0), which included more significant features—including Hive-on-Spark.

Read More

New in CDH 5.4: Apache HBase Request Throttling

Categories: CDH HBase

The following post about the new request throttling feature in HBase 1.1 (now shipping in CDH 5.4) originally published in the ASF blog. We re-publish it here for your convenience.

Running multiple workloads on HBase has always been challenging, especially  when trying to execute real-time workloads while concurrently running analytical jobs. One possible way to address this issue is to throttle analytical MR jobs so that real-time workloads are less affected.

Read More

Sneak Preview: HBaseCon 2015 Use Cases Track

Categories: Community Events HBase

This year’s HBaseCon Use Cases track includes war stories about some of the world’s best examples of running Apache HBase in production.

As a final sneak preview leading up to the show next week, in this post, I’ll give you a window into the HBaseCon 2015’s (May 7 in San Francisco) Use Cases track.

hbasecon logo

Thanks, Program Committee!

  • “HBase @ Flipboard”

Read More