Tag Archives: use cases

How-to: Do Data Quality Checks using Apache Spark DataFrames

Categories: How-to Spark

Apache Spark’s ability to support data quality checks via DataFrames is progressing rapidly. This post explains the state of the art and future possibilities.

Apache Hadoop and Apache Spark make Big Data accessible and usable so we can easily find value, but that data has to be correct, first. This post will focus on this problem and how to solve it with Apache Spark 1.3 and Apache Spark 1.4 using DataFrames.

Read More

How-to: Scan Salted Apache HBase Tables with Region-Specific Key Ranges in MapReduce

Categories: Guest HBase How-to

Thanks to Pengyu Wang, software developer at FINRA, for permission to republish this post.

Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date,

Read More

Apache Phoenix Joins Cloudera Labs

Categories: Cloudera Labs HBase

We are happy to announce the inclusion of Apache Phoenix in Cloudera Labs.

[Update: A new package for Apache Phoenix 4.7.0 on CDH 5.7 was released in June 2016.]

Apache Phoenix is an efficient SQL skin for Apache HBase that has created a lot of buzz. Many companies are successfully using this technology, including Salesforce.com, where Phoenix first started.

Phoenix logo

With the news that Apache Phoenix integration with Cloudera’s platform has joined Cloudera Labs,

Read More

Sneak Preview: HBaseCon 2015 Use Cases Track

Categories: Community Events HBase

This year’s HBaseCon Use Cases track includes war stories about some of the world’s best examples of running Apache HBase in production.

As a final sneak preview leading up to the show next week, in this post, I’ll give you a window into the HBaseCon 2015’s (May 7 in San Francisco) Use Cases track.

hbasecon logo

Thanks, Program Committee!

  • “HBase @ Flipboard”

Read More

Sneak Preview: HBaseCon 2015 Ecosystem Track

Categories: Community Events HBase

This year’s HBaseCon Ecosystem track covers projects that are complementary to HBase (with a focus on SQL) such as Apache Phoenix, Apache Kylin, and Trafodion.

In this post, I’ll give you a window into the HBaseCon 2015’s (May 7 in San Francisco) Ecosystem track.

hbasecon logo

Thanks, Program Committee!

  • “HBase as an IoT Stream Analytics Platform for Parkinson’s Disease Research”

Read More