Category Archives: CDH

Introducing CDH4 Beta 2

Categories: CDH General

I’m pleased to inform our users and customers that we have released the Cloudera’s Distribution Including Apache Hadoop version 4 (CDH4) 2nd and final beta today. We received great feedback from the community from the first beta and this release incorporates that feedback as well as a number of new enhancements.

CDH4 has a great many enhancements compared to CDH3.

  • Availability – a high availability namenode, better job isolation, improved hard disk failure handling,

Read more

High Availability for the Hadoop Distributed File System (HDFS)

Categories: CDH Community General Hadoop HDFS


Apache Hadoop consists of two primary components: HDFS and MapReduce. HDFS, the Hadoop Distributed File System, is the primary storage system of Hadoop, and is responsible for storing and serving all data stored in Hadoop. MapReduce is a distributed processing framework designed to operate on data stored in HDFS.

HDFS has long been considered a highly reliable file system.  An empirical study done at Yahoo! concluded that across Yahoo!’s 20,000 nodes running Apache Hadoop in 10 different clusters in 2009,

Read more

Thoughts on Cloudera and Cisco UCS reference architecture for Apache Hadoop

Categories: CDH Cloudera Manager

Cloudera and Cisco jointly announced a reference architecture for running Cloudera’s Distribution Including Apache Hadoop (CDH) and Cloudera Manager on Cisco’s Unified Computing System (UCS) last November. It was the first Apache Hadoop reference architecture assembled by Cisco, and is proudly certified by Cloudera.

I bring a different perspective on the Cloudera-Cisco relationship, as I worked for over five years in Cisco on the software powering the Nexus 5000 series switches and the Cisco Virtual Interface Card.

Read more

Indexing Files via Solr and Java MapReduce

Categories: CDH Cloudera Manager

Several weeks ago, I set about to demonstrate the ease with which Solr and Map/Reduce can be integrated. I was unable to find a simple, yet comprehensive, primer on integrating the two technologies. So I set about to write one.

What follows is my bare-bones tutorial on getting Solr up and running to index each word of the complete works of Shakespeare. Note: Special thanks to Sematext for looking over the Solr bits and making sure they are sane.

Read more

MapReduce 2.0 in Apache Hadoop 0.23

Categories: CDH General Hadoop MapReduce

In Building and Deploying MR2 we presented a brief introduction to MapReduce in Apache Hadoop 0.23 and focused on the steps to set up a single-node cluster. This blog provides developers with architectural details of the new MapReduce design. 

Apache Hadoop 0.23 has major improvements over previous releases. Here are a few highlights on the MapReduce front; note that there are also major HDFS improvements, which are out of scope of this post.

Read more