One of the repeating themes we have heard while working with our customers and the community is that Apache Hadoop configuration and deployment is a pain. Often times, Hadoop is the first truly distributed system that administrators encounter, and the problem is made worse by the lack of standardized packages and deployment tools. And some releases are buggy. And upgrades are hard. And the list goes on.
In order for Hadoop to truly disrupt the enterprise,
It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.
At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added,
The first release (0.19.0) from the 0.19 branch of Apache Hadoop Core was made on November 24. Many changes go into a release like this, and it can be difficult to get a feel for the more significant ones, even with the detailed Jira log, change log, and release notes. (There’s also JDiff documentation, which is a great way to see how the public API changed,
We’re happy to announce a new tool we have been developing here at Cloudera: Hadoop Development Status. Hadoop Development Status aims to help the Hadoop community understand its direction, health, and participants. The project currently monitors the most active contributors according to mailing list traffic, the most watched JIRA tickets, and aggregate traffic volumes on the Hadoop mailing lists.
The graph of messages per month on the Hadoop Core lists shows a sustained growth in traffic.
It is common for a MapReduce program to require one or more files to be read by each map or reduce task before execution. For example, you may have a lookup table that needs to be parsed before processing a set of records. To address this scenario, Hadoop’s MapReduce implementation includes a distributed file cache that will manage copying your file(s) out to the task execution nodes.
The DistributedCache was introduced in Hadoop 0.7.0;