Category Archives: CDH

Automatically Documenting Apache Hadoop Configuration

Categories: CDH Hadoop

Ari Rabkin is a summer intern at Cloudera, working with the engineering team to help make Hadoop more usable and simpler to configure. The rest of the year, Ari is a PhD student at UC Berkeley. He’s applying the results of recent research to automatically find and document configuration options for Hadoop.


Hadoop has a key-value style of configuration, where each configuration option has a name and a value. There is no central list of options,

Read more

Evolution of Hadoop Ecosystem: AOL Advertising Experience

Categories: CDH Data Ingestion General Guest Use Case

Pero works on research and development in new technologies for online advertising at Aol Advertising R&D in Palo Alto. Over the past 4 years he has been the Chief Architect of R&D distributed ecosystem comprising more than thousand nodes in multiple data centers. He also led large-scale contextual analysis, segmentation and machine learning efforts at AOL, Yahoo and Cadence Design Systems and published patents and research papers in these areas.

A critical premise for success of online advertising networks is to successfully collect,

Read more

SCM Express: Now Anyone Can Experience the Power of Apache Hadoop

Categories: CDH General

Phil Langdale is a software engineer at Cloudera and the technical lead for Cloudera’s SCM Express product.

What is SCM Express?

As powerful and useful as Apache Hadoop is, anyone who has setup up a cluster from scratch is well aware of how challenging it can be: every machine has to have the right packages installed and correctly configured so that they can all work together,

Read more