Category Archives: CDH

Hadoop World 2011: A Glimpse into Development

Categories: Avro Careers CDH Community Flume General Hadoop HBase HDFS Hive MapReduce Oozie Pig Sqoop Training Use Case ZooKeeper

The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas including tools,

Read more

Automatically Documenting Apache Hadoop Configuration

Categories: CDH Hadoop

Ari Rabkin is a summer intern at Cloudera, working with the engineering team to help make Hadoop more usable and simpler to configure. The rest of the year, Ari is a PhD student at UC Berkeley. He’s applying the results of recent research to automatically find and document configuration options for Hadoop.


Hadoop has a key-value style of configuration, where each configuration option has a name and a value. There is no central list of options,

Read more

Evolution of Hadoop Ecosystem: AOL Advertising Experience

Categories: CDH Data Ingestion General Guest Use Case

Pero works on research and development in new technologies for online advertising at Aol Advertising R&D in Palo Alto. Over the past 4 years he has been the Chief Architect of R&D distributed ecosystem comprising more than thousand nodes in multiple data centers. He also led large-scale contextual analysis, segmentation and machine learning efforts at AOL, Yahoo and Cadence Design Systems and published patents and research papers in these areas.

A critical premise for success of online advertising networks is to successfully collect,

Read more