Announcing Hadoop World: NYC 2009: RFP Open

Categories: General

Lately, we’ve been spending a lot of time on the East Coast, and one thing is clear: Hadoop is everywhere.

Hadoop usage on the East Coast tends to be slightly different. There are still web companies with armys of tech gurus, but there are also many “regular” industries and enterprises using and exploring Hadoop. It’s time to get together and learn a thing or two from one other.

Hadoop World: NYC 2009 will take place on October 2nd,

Read more

Protecting per-DataNode Metadata

Categories: Hadoop HDFS

Administrators of HDFS clusters understand that the HDFS metadata is some of the most precious bits they have. While you might have hundreds of terabytes of information stored in HDFS, the NameNode’s metadata is the key that allows this information, spread across several million “blocks” to be reassembled into coherent, ordered files.

The techniques to preserve HDFS NameNode metadata are well established. You should store several copies across many separate local hard drives,

Read more

10 MapReduce Tips

Categories: General Hadoop MapReduce

This piece is based on the talk “Practical MapReduce” that I gave at Hadoop User Group UK on April 14.

1. Use an appropriate MapReduce language

There are many languages and frameworks that sit on top of MapReduce, so it’s worth thinking up-front which one to use for a particular problem. There is no one-size-fits-all language; each has different strengths and weaknesses.

  • Java: Good for: speed;

Read more

5 Common Questions About Apache Hadoop

Categories: General Hadoop

There’s been a lot of buzz about Apache Hadoop lately. Just the other day, some of our friends at Yahoo! reclaimed the terasort record from Google using Hadoop, and the folks at Facebook let on that they ingest 15 terabytes a day into their 2.5 petabyte Hadoop-powered data warehouse.

But many people still find themselves wondering just how all this works, and what it means to them. We get a lot of common questions while working with customers,

Read more

Using Cloudera’s Hadoop AMIs to process EBS datasets on EC2

Categories: Community General Guest Hadoop


A while back, we noticed a blog post From Arun Jacob over at Evri (if you haven’t seen Evri before, it’s a pretty impressive take on search UI). We were particularly interested in helping Arun and others use EC2 and Hadoop to process data stored on EBS as Amazon makes many public data sets available. After getting started, Arun volunteered to write up his experience, and we’re happy to share it on the Cloudera blog.  -Christophe

Background
A couple of weeks ago I managed to get a Hadoop cluster up and running on EC2 using the /src/contrib/ec2 scripts found in the 0.18.3 version of Hadoop.

Read more