We’re happy to announce a new tool we have been developing here at Cloudera: Hadoop Development Status. Hadoop Development Status aims to help the Hadoop community understand its direction, health, and participants. The project currently monitors the most active contributors according to mailing list traffic, the most watched JIRA tickets, and aggregate traffic volumes on the Hadoop mailing lists.
The graph of messages per month on the Hadoop Core lists shows a sustained growth in traffic.
It is common for a MapReduce program to require one or more files to be read by each map or reduce task before execution. For example, you may have a lookup table that needs to be parsed before processing a set of records. To address this scenario, Hadoop’s MapReduce implementation includes a distributed file cache that will manage copying your file(s) out to the task execution nodes.
The DistributedCache was introduced in Hadoop 0.7.0;
As promised in my post about installing Scribe for log collection, I’m going to cover how to configure and use Scribe for the purpose of collecting Hadoop logs. In this post I’ll describe how to create the Scribe Thrift client for use in Java, add a new log4j Appender to Hadoop, configure Scribe, and collect logs from each node in a Hadoop cluster. At the end of the post, I will link to all source and configuration files mentioned in this guide.