Category Archives: HBase

YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

Categories: Cloudera Labs HBase Performance

YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.

Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.

Read More

Apache Spark Comes to Apache HBase with HBase-Spark Module

Categories: Cloudera Labs Community HBase Spark

The SparkOnHBase project in Cloudera Labs was recently merged into the Apache HBase trunk. In this post, learn the project’s history and what the future looks like for the new HBase-Spark module.

SparkOnHBase was first pushed to Github on July 2014, just six months after Spark Summit 2013 and five months after Apache Spark first shipped in CDH. That conference was a big turning point for me,

Read More

Inside Santander’s Near Real-Time Data Ingest Architecture

Categories: Flume HBase Kafka

Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK.

Cloudera Professional Services has been working with Santander UK to build a near real-time (NRT) transactional analytics system on Apache Hadoop. The objective is to capture, transform, enrich, count, and store a transaction within a few seconds of a card purchase taking place. The system receives the bank’s retail customer card transactions and calculates the associated trend information aggregated by account holder and over a number of dimensions and taxonomies.

Read More

Designing Fraud-Detection Architecture That Works Like Your Brain Does

Categories: Flume HBase Kafka Spark Use Case

To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka).

At its core, fraud detection is about detection whether people are behaving “as they should,” otherwise known as catching anomalies in a stream of events. This goal is reflected in diverse applications such as detecting credit-card fraud, flagging patients who are doctor shopping to obtain a supply of prescription drugs,

Read More

Thrift Client Authentication Support in Apache HBase 1.0

Categories: HBase

Thrift client authentication and doAs impersonation, introduced in HBase 1.0, provides more flexibility for your HBase installation.

In the two-part blog series “How-to: Use the HBase Thrift Interface” (Part 1 and Part 2), Jesse Anderson explained the Thrift interface in detail, and demonstrated how to use it. He didn’t cover running Thrift in a secure Apache HBase cluster, however, because there was no difference in the client configuration with the HBase releases available at that time.

Read More