Category Archives: CDH

Experimenting with MapReduce 2.0

Categories: CDH General MapReduce

In Building and Deploying MR2, we presented a brief introduction to MapReduce in Apache Hadoop 0.23 and focused on the steps to setup a single-node cluster. In MapReduce 2.0 in Hadoop 0.23, we discussed the new architectural aspects of the MapReduce 2.0 design. This blog post highlights the main issues to consider when migrating from MapReduce 1.0 to MapReduce 2.0. Note that both MapReduce 1.0 and MapReduce 2.0 are included in CDH4.

Read more

Watching the Clock: Cloudera’s Response to Leap Second Troubles

Categories: CDH Cloudera Manager Community General Hadoop Support Testing

At 5 pm PDT on June 30, a leap second was added to the Universal Coordinated Time (UTC). Within an hour, Cloudera Support started receiving reports of systems running at 100% CPU utilization. The Support Team worked quickly to understand and diagnose the problem and soon published a solution. Bugs due to the leap second coupled with the Amazon Web Services outage would make this Cloudera’s busiest support weekend to date.

Since Hadoop is written in Java and closely interoperates with the underlying OS,

Read more

Apache HBase Write Path

Categories: CDH General HBase

Apache HBase is the Hadoop database, and is based on the Hadoop Distributed File System (HDFS). HBase makes it possible to randomly access and update data stored in HDFS, but files in HDFS can only be appended to and are immutable after they are created.  So you may ask, how does HBase provide low-latency reads and writes? In this blog post, we explain this by describing the write path of HBase —

Read more

The Elephant in the Enterprise

Categories: CDH Community General

On Tuesday, June 12th The Churchill Club of Silicon Valley hosted a panel discussion on Hadoop’s evolution from an open-source project to becoming a standard component of today’s enterprise computing fabric. The lively and dynamic discussion was moderated by Cade Metz, Editor, Wired Enterprise.

Panelists included:

Michael Driscoll, CEO, Metamarkets
Andrew Mendelsohn, SVP, Oracle Server Technologies
Mike Olson, CEO, Cloudera
Jay Parikh,

Read more

CDH4 and Cloudera Enterprise 4.0 Now Available

Categories: CDH Cloudera Manager General

I’m very pleased to announce the immediate General Availability of CDH4 and Cloudera Manager 4 (part of the Cloudera Enterprise 4.0 subscription).  These releases are an exciting milestone for Cloudera customers, Cloudera users and the open source community as a whole.

Functionality
Both CDH4 and Cloudera Manager 4 are chock full of new features. Many new features will appeal to enterprises looking to move  more important workloads onto the Apache Hadoop platform.

Read more