Database Access with Apache Hadoop

Categories: General Hadoop MapReduce

Editor’s note (added Nov. 9. 2013): Valuable data in an organization is often stored in relational database systems. To access that data, you could use external APIs as detailed in this blog post below, or you could use Apache Sqoop, an open source tool (packaged inside CDH) that allows users to import data from a relational database into Apache Hadoop for further processing. Sqoop can also export those results back to the database for consumption by other clients.

Read more

Multi-host SecondaryNameNode Configuration

Categories: Hadoop HDFS

You might think that the SecondaryNameNode is a hot backup daemon for the NameNode. You’d be wrong. The SecondaryNameNode is a poorly understood component of the HDFS architecture, but one which provides the important function of lowering NameNode restart time. This blog post describes how to configure this daemon in a large-scale environment. The default Hadoop configuration places an instance of the SecondaryNameNode on the same node as the NameNode. A more scalable configuration involves configuring the SecondaryNameNode on a different machine.

Read more

The Small Files Problem

Categories: General Hadoop

Small files are a big problem in Hadoop — or, at least, they are if the number of questions on the user list on this topic is anything to go by. In this post I’ll look at the problem, and examine some common solutions.

Problems with small files and HDFS

A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files,

Read more

State of the Elephant 2008

Categories: Community General Hadoop

It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.

Organization

At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added,

Read more