Category Archives: Cloudera Labs

YCSB, the Open Standard for NoSQL Benchmarking, Joins Cloudera Labs

Categories: Cloudera Labs HBase Performance

YCSB, the open standard for comparative performance evaluation of data stores, is now available to CDH users for their Apache HBase deployments via new packages from Cloudera Labs.

Many factors go into deciding which data store should be used for production applications, including basic features, data model, and the performance characteristics for a given type of workload. It’s critical to have the ability to compare multiple data stores intelligently and objectively so that you can make sound architectural decisions.

Read more

Apache Spark Comes to Apache HBase with HBase-Spark Module

Categories: Cloudera Labs Community HBase Spark

The SparkOnHBase project in Cloudera Labs was recently merged into the Apache HBase trunk. In this post, learn the project’s history and what the future looks like for the new HBase-Spark module.

SparkOnHBase was first pushed to Github on July 2014, just six months after Spark Summit 2013 and five months after Apache Spark first shipped in CDH. That conference was a big turning point for me,

Read more

Getting Started with Ibis and How to Contribute

Categories: Cloudera Labs Impala

Learn about the architecture of Ibis, the roadmaps for Ibis and Impala, and how to get started and contribute.

We created Ibis, a new Python data analysis framework now incubating in Cloudera Labs, with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop,

Read more

Ibis on Impala: Python at Scale for Data Science

Categories: Cloudera Labs Data Science Impala

This new Cloudera Labs project promises to deliver the great Python user experience and ecosystem at Hadoop scale.

Across the user community, you will find general agreement that the Apache Hadoop stack has progressed dramatically in just the past few years. For example, Search and Impala have moved Hadoop beyond batch processing, while developers are seeing significant productivity gains and additional use cases by transitioning from MapReduce to Apache Spark.

Thanks to such advances in the ecosystem,

Read more

Apache Phoenix Joins Cloudera Labs

Categories: Cloudera Labs HBase

We are happy to announce the inclusion of Apache Phoenix in Cloudera Labs.

[Update: A new package for Apache Phoenix 4.7.0 on CDH 5.7 was released in June 2016.]

Apache Phoenix is an efficient SQL skin for Apache HBase that has created a lot of buzz. Many companies are successfully using this technology, including Salesforce.com, where Phoenix first started.

Phoenix logo

With the news that Apache Phoenix integration with Cloudera’s platform has joined Cloudera Labs,

Read more