Tag Archives: cloudera’s distribution for hadoop

Map-Reduce With Ruby Using Apache Hadoop

Categories: Hadoop MapReduce

Guest re-post from Phil Whelan, a large-scale web-services consultant based in Vancouver, BC.

Map-Reduce With Hadoop Using Ruby
Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine.

Read more

A profile of Apache Hadoop MapReduce computing efficiency (continued)

Categories: Community Guest Hadoop MapReduce

Guest post from Paul Burkhardt, a Research Developer at SRA International, Inc. where he develops large-scale, distributed computing solutions.

Part II

Previously we proposed how we measure the performance in Hadoop MapReduce applications in an effort to better understand the computing efficiency. In this part, we’ll describe some results and illuminate both good and bad characteristics.

We selected our SIFT-M MapReduce application, described in our presentation at Hadoop World 2010 [3],

Read more

A profile of Apache Hadoop MapReduce computing efficiency

Categories: Community Guest Hadoop MapReduce

Guest post from Paul Burkhardt, a Research Developer at SRA International, Inc. where he develops large-scale, distributed computing solutions.

Part I

We were asked by one of our customers to investigate Hadoop MapReduce for solving distributed computing problems. We were particularly interested in how effectively MapReduce applications utilize computing resources. Computing efficiency is important not only for speed-up and scale-out performance but also power consumption. Consider a hypothetical High-Performance Computing (HPC) system of 10,000 nodes running 50% idle at 50 watts per idle node,

Read more

Migrating to CDH

Categories: General Hadoop HBase HDFS Hive MapReduce Pig ZooKeeper

With the recent release of CDH3b2, many users are more interested than ever to try out Cloudera’s Distribution for Hadoop (CDH). One of the questions we often hear is, “what does it take to migrate?”.

Why Migrate?

If you’re not familiar with CDH3b2, here’s what you need to know.

All versions of CDH provide:

  • RPM and Debian packages for simple installation and management.
  • Clean integration with the host operating system.

Read more