Introducing Sqoop

Categories: Data Ingestion General Hadoop Hive

In addition to providing you with a dependable release of Hadoop that is easy to configure, at Cloudera we also focus on developing tools to extend Hadoop’s usability, and make Hadoop a more central component of your data infrastructure. In this vein, we’re proud to announce the availability of Sqoop, a tool designed to easily import information from SQL databases into your Hadoop cluster.

Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:

  • Imports individual tables or entire databases to files in HDFS
  • Generates Java classes to allow you to interact with your imported data
  • Provides the ability to import from SQL databases straight into your Hive data warehouse

After setting up an import job in Sqoop,

Read more

Building a distributed concurrent queue with Apache ZooKeeper

Categories: ZooKeeper

In my first few weeks here at Cloudera, I’ve been tasked with helping out with the Apache ZooKeeper system, part of the umbrella Hadoop project. ZooKeeper is a system for coordinating distributed processes. In a distributed environment, getting processes to act in any kind of synchrony is an extremely hard problem. For example, simply having a set of processes wait until they’ve all reached the same point in their execution –

Read more

Announcing Cloudera Certification for Apache Hadoop

Categories: Community General Hadoop Training

As Apache Hadoop continues to turn heads at startups and big enterprises alike, Cloudera has received several requests to offer certification in addition to our popular training programs.

Certification is a critical component of any software ecosystem, and especially so for open source projects with quickly expanding user bases. Certification allows developers to ensure their skills are up to date, and allows employers and customers to confidently identify individuals that are up for the challenge of solving problems with Hadoop.

Read more

Announcing Hadoop World: NYC 2009: RFP Open

Categories: General

Lately, we’ve been spending a lot of time on the East Coast, and one thing is clear: Hadoop is everywhere.

Hadoop usage on the East Coast tends to be slightly different. There are still web companies with armys of tech gurus, but there are also many “regular” industries and enterprises using and exploring Hadoop. It’s time to get together and learn a thing or two from one other.

Hadoop World: NYC 2009 will take place on October 2nd,

Read more

Protecting per-DataNode Metadata

Categories: Hadoop HDFS

Administrators of HDFS clusters understand that the HDFS metadata is some of the most precious bits they have. While you might have hundreds of terabytes of information stored in HDFS, the NameNode’s metadata is the key that allows this information, spread across several million “blocks” to be reassembled into coherent, ordered files.

The techniques to preserve HDFS NameNode metadata are well established. You should store several copies across many separate local hard drives,

Read more