Tag Archives: python

10 MapReduce Tips

Categories: General Hadoop MapReduce

This piece is based on the talk “Practical MapReduce” that I gave at Hadoop User Group UK on April 14.

1. Use an appropriate MapReduce language

There are many languages and frameworks that sit on top of MapReduce, so it’s worth thinking up-front which one to use for a particular problem. There is no one-size-fits-all language; each has different strengths and weaknesses.

  • Java: Good for: speed;

Read More

State of the Elephant 2008

Categories: Community General Hadoop

It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.

Organization

At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added,

Read More

Sending Files to Remote Task Nodes with Hadoop MapReduce

Categories: Hadoop MapReduce

It is common for a MapReduce program to require one or more files to be read by each map or reduce task before execution. For example, you may have a lookup table that needs to be parsed before processing a set of records. To address this scenario, Hadoop’s MapReduce implementation includes a distributed file cache that will manage copying your file(s) out to the task execution nodes.

The DistributedCache was introduced in Hadoop 0.7.0;

Read More

Installing Scribe For Log Collection

Categories: Data Ingestion

Scribe is a newly released log collection tool that dumps log files from various nodes in a cluster to Scribe servers, where the logs are stored for further use.  Facebook describes their usage of Scribe by saying, “[Scribe] runs on thousands of machines and reliably delivers tens of billions of messages a day.”  It turns out that Scribe is rather difficult to install, so the hope of this post is to help those of you attempting to install Scribe.

Read More