Hadoop Core version 0.20.0 was released on April 22. In this post I will run through some of the larger or more significant user-facing changes since the first 0.19 releasethere were 262 Jiras fixed for this release (fewer than the 303 in 0.19.0). The full list, which includes many bug fixes, can be found in the change log (or in Jira), and in the release notes.
We asked Brian Bockelman, a Post Doc Research Associate in the Computer Science & Engineering Department at the University of NebraskaLincoln, to tell us how Hadoop is being used to process the results from High-Energy Physics experiments. His response gives insights into the kind and volume of data that High-Energy Physics experiments generate and how Hadoop is being used at the University of Nebraska. -Matt
In the least technical language,
Configuring a Hadoop cluster is something akin to voodoo. There are a large number of variables in hadoop-default.xml that you can override in hadoop-site.xml. Some specify file paths on your system, but others adjust levers and knobs deep inside Hadoop’s guts. Unfortuately, there’s little or no documentation on how to set them well. Is there a single optimal configuration? Are there some settings that can just be “set to 11?”
(Added 6/4/2013) Please note the instructions below are deprecated. Please refer to the CDH4 Security Guide for up-to-date procedures.
A few weeks ago we ran an Apache Hadoop hackathon. ApacheCon participants were invited to use our 10-node Hadoop cluster to explore Hadoop and play with some datasets that we had loaded on in advance. One challenge we had to face was, how do we do this in a secure way?
(guest blog post by Matei Zaharia)
When Apache Hadoop started out, it was designed mainly for running large batch jobs such as web indexing and log mining. Users submitted jobs to a queue, and the cluster ran them in order. However, as organizations placed more data in their Hadoop clusters and developed more computations they wanted to run, another use case became attractive: sharing a MapReduce cluster between multiple users.