5 Common Questions About Apache Hadoop

Categories: General Hadoop

There’s been a lot of buzz about Apache Hadoop lately. Just the other day, some of our friends at Yahoo! reclaimed the terasort record from Google using Hadoop, and the folks at Facebook let on that they ingest 15 terabytes a day into their 2.5 petabyte Hadoop-powered data warehouse.

But many people still find themselves wondering just how all this works, and what it means to them. We get a lot of common questions while working with customers,

Read more

Using Cloudera’s Hadoop AMIs to process EBS datasets on EC2

Categories: Community General Guest Hadoop


A while back, we noticed a blog post From Arun Jacob over at Evri (if you haven’t seen Evri before, it’s a pretty impressive take on search UI). We were particularly interested in helping Arun and others use EC2 and Hadoop to process data stored on EBS as Amazon makes many public data sets available. After getting started, Arun volunteered to write up his experience, and we’re happy to share it on the Cloudera blog.  -Christophe

Background
A couple of weeks ago I managed to get a Hadoop cluster up and running on EC2 using the /src/contrib/ec2 scripts found in the 0.18.3 version of Hadoop.

Read more

High Energy Hadoop

Categories: General Guest Hadoop HDFS

We asked Brian Bockelman, a Post Doc Research Associate in the Computer Science & Engineering Department at the University of Nebraska–Lincoln, to tell us how Hadoop is being used to process the results from High-Energy Physics experiments.  His response gives insights into the kind and volume of data that High-Energy Physics experiments generate and how Hadoop is being used at the University of Nebraska. -Matt

In the least technical language,

Read more

Debian packages for Apache Hadoop

Categories: Community Hadoop

When we announced Cloudera’s Distribution for Apache Hadoop last month, we asked the community to give us feedback on what features they liked best and what new development was most important to them. Almost immediately, Debian and Ubuntu packages for Hadoop emerged as the most popular request. A lot of customers prefer Debian derivatives over Red Hat, and installing RPMs on top of Debian, while possible with tools like alien,

Read more