For the last few months, we’ve been working with the TVA to help them manage hundreds of TB of data from America’s power grids. As the Obama administration investigates ways to improve our energy infrastructure, the TVA is doing everything they can to keep up with the volumes of data generated by the “smart grid.” But as you know, storing that data is only half the battle. In this guest blog post, the TVA’s Josh Patterson goes into detail about how Hadoop enables them to conduct deeper analysis over larger data sets at considerably lower costs than existing solutions. Read more
In addition to providing you with a dependable release of Hadoop that is easy to configure, at Cloudera we also focus on developing tools to extend Hadoop’s usability, and make Hadoop a more central component of your data infrastructure. In this vein, we’re proud to announce the availability of Sqoop, a tool designed to easily import information from SQL databases into your Hadoop cluster.
Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:
- Imports individual tables or entire databases to files in HDFS
- Generates Java classes to allow you to interact with your imported data
- Provides the ability to import from SQL databases straight into your Hive data warehouse
After setting up an import job in Sqoop,
In my first few weeks here at Cloudera, I’ve been tasked with helping out with the Apache ZooKeeper system, part of the umbrella Hadoop project. ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they’ve all reached the same point in their execution –
Administrators of HDFS clusters understand that the HDFS metadata is some of the most precious bits they have. While you might have hundreds of terabytes of information stored in HDFS, the NameNode’s metadata is the key that allows this information, spread across several million “blocks” to be reassembled into coherent, ordered files.
The techniques to preserve HDFS NameNode metadata are well established. You should store several copies across many separate local hard drives,
This piece is based on the talk Practical MapReduce that I gave at Hadoop User Group UK on April 14.
1. Use an appropriate MapReduce language
There are many languages and frameworks that sit on top of MapReduce, so its worth thinking up-front which one to use for a particular problem. There is no one-size-fits-all language; each has different strengths and weaknesses.
- Java: Good for: speed;