For the last few months, we’ve been working with the TVA to help them manage hundreds of TB of data from America’s power grids. As the Obama administration investigates ways to improve our energy infrastructure, the TVA is doing everything they can to keep up with the volumes of data generated by the “smart grid.” But as you know, storing that data is only half the battle. In this guest blog post, the TVA’s Josh Patterson goes into detail about how Hadoop enables them to conduct deeper analysis over larger data sets at considerably lower costs than existing solutions. Read more
A while back, we noticed a blog post From Arun Jacob over at Evri (if you haven’t seen Evri before, it’s a pretty impressive take on search UI). We were particularly interested in helping Arun and others use EC2 and Hadoop to process data stored on EBS as Amazon makes many public data sets available. After getting started, Arun volunteered to write up his experience, and we’re happy to share it on the Cloudera blog. -Christophe
A couple of weeks ago I managed to get a Hadoop cluster up and running on EC2 using the /src/contrib/ec2 scripts found in the 0.18.3 version of Hadoop.
Hadoop Core version 0.20.0 was released on April 22. In this post I will run through some of the larger or more significant user-facing changes since the first 0.19 releasethere were 262 Jiras fixed for this release (fewer than the 303 in 0.19.0). The full list, which includes many bug fixes, can be found in the change log (or in Jira), and in the release notes.
We asked Brian Bockelman, a Post Doc Research Associate in the Computer Science & Engineering Department at the University of NebraskaLincoln, to tell us how Hadoop is being used to process the results from High-Energy Physics experiments. His response gives insights into the kind and volume of data that High-Energy Physics experiments generate and how Hadoop is being used at the University of Nebraska. -Matt
In the least technical language,
Configuring a Hadoop cluster is something akin to voodoo. There are a large number of variables in hadoop-default.xml that you can override in hadoop-site.xml. Some specify file paths on your system, but others adjust levers and knobs deep inside Hadoop’s guts. Unfortuately, there’s little or no documentation on how to set them well. Is there a single optimal configuration? Are there some settings that can just be “set to 11?”