Thanks to Pengyu Wang, software developer at FINRA, for permission to republish this post.
Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date,
The best data protection strategy is to remove sensitive information from everyplace it’s not needed.
Have you ever wondered what sort of “sensitive” information might wind up in Apache Hadoop log files? For example, if you’re storing credit card numbers inside HDFS, might they ever “leak” into a log file outside of HDFS? What about SQL queries? If you have a query like select * from table where creditcard = ‘1234-5678-9012-3456’,
Apache Hive 1.2.0, although not a major release, contains significant improvements.
Recently, the Apache Hive community moved to a more frequent, incremental release schedule. So, a little while ago, we covered the Apache Hive 1.0.0 release and explained how it was renamed from 0.14.1 with only minor feature additions since 0.14.0.
Shortly thereafter, Apache Hive 1.1.0 was released (renamed from Apache Hive 0.15.0), which included more significant features—including Hive-on-Spark.
The following post about the new request throttling feature in HBase 1.1 (now shipping in CDH 5.4) originally published in the ASF blog. We re-publish it here for your convenience.
Running multiple workloads on HBase has always been challenging, especially when trying to execute real-time workloads while concurrently running analytical jobs. One possible way to address this issue is to throttle analytical MR jobs so that real-time workloads are less affected.
This year’s HBaseCon Use Cases track includes war stories about some of the world’s best examples of running Apache HBase in production.
As a final sneak preview leading up to the show next week, in this post, I’ll give you a window into the HBaseCon 2015’s (May 7 in San Francisco) Use Cases track.
Thanks, Program Committee!