Tag Archives: install hadoop

Exploring Compression for Hadoop: One DBA’s Story

Categories: General Guest HDFS Ops and DevOps

This guest post comes to us courtesy of Gwen Shapira (@gwenshap), a database consultant for The Pythian Group (and an Oracle ACE Director).

Most western countries use street names and numbers to navigate inside cities. But in Japan, where I live now, very few streets have them.

Sometimes solving technical problems is similar to navigating a city without many street names: Once you arrive at the desired location,

Read more

Setting up CDH3 Hadoop on my new Macbook Pro

Categories: Community Hadoop

This is a guest re-post courtesy of Arun Jacob, Data Architect at Disney, prior to that he was an engineer at RichRelevance and Evri. For the last couple of years, Arun has been focused on data mining/information extraction, using a mix of custom and open source technologies.

A New Machine

I’m fortunate enough to have recently received a Macbook Pro, 2.8 GHz Intel dual core, with 8GB RAM. This is the third time I’ve turned a vanilla mac into a ninja coding machine,

Read more

Map-Reduce With Ruby Using Apache Hadoop

Categories: Hadoop MapReduce

Guest re-post from Phil Whelan, a large-scale web-services consultant based in Vancouver, BC.

Map-Reduce With Hadoop Using Ruby
Here I demonstrate, with repeatable steps, how to fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine.

Read more

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more

Apache Hadoop HA Configuration

Categories: General Guest Hadoop HDFS

Disclaimer: Cloudera no longer approves of the recommendations in this post. Please see this documentation for configuration recommendations.

One of the things we get a lot of questions about is how to make Hadoop highly available. There is still a lot of work to be done on this front, but we wanted to take a moment and share the best practices from one of our customers. Check out what Paul George has to say about how they keep thier NameNode up at ContextWeb.

Read more