Category Archives: CDH

Apache Hadoop in 2011

Categories: CDH Community General Hadoop

2011 was a breakthrough year for Apache Hadoop as many more mainstream organizations large and small turned to Hadoop to manage and process Big Data, while enterprise software and hardware vendors have also made Hadoop a prominent part of their offerings. Big Data and Hadoop became synonymous in much of the enterprise discourse, and Big Data interest is not restricted to Big Companies.

Apache Hadoop Releases

Hadoop had three major releases in 2011: 1.0 (AKA 0.20.205.x),

Read more

FoneDoktor, A WibiData Application

Categories: CDH Hadoop HBase Use Case

This guest blog post is from Alex Loddengaard, creator of FoneDoktor, an Android app that monitors phone usage and recommends performance and battery life improvements. FoneDoktor uses WibiData, a data platform built on Apache HBase from Cloudera’s Distribution including Apache Hadoop, to store and analyze Android usage data. In this post, Alex will discuss FoneDoktor’s implementation and discuss why WibiData was a good data solution.

Read more

Hadoop World 2011: A Glimpse into Development

Categories: Avro Careers CDH Community Flume General Hadoop HBase HDFS Hive MapReduce Oozie Pig Sqoop Training Use Case ZooKeeper

The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas including tools,

Read more

Automatically Documenting Apache Hadoop Configuration

Categories: CDH Hadoop

Ari Rabkin is a summer intern at Cloudera, working with the engineering team to help make Hadoop more usable and simpler to configure. The rest of the year, Ari is a PhD student at UC Berkeley. He’s applying the results of recent research to automatically find and document configuration options for Hadoop.


Hadoop has a key-value style of configuration, where each configuration option has a name and a value. There is no central list of options,

Read more