The Small Files Problem

Categories: General Hadoop

Small files are a big problem in Hadoop — or, at least, they are if the number of questions on the user list on this topic is anything to go by. In this post I’ll look at the problem, and examine some common solutions.

Problems with small files and HDFS

A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files,

Read more

State of the Elephant 2008

Categories: Community General Hadoop

It’s a new year, the time when we take a moment to look back at the previous one, and forward to what might be coming next. In the world of Hadoop a lot happened in 2008.

Organization

At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added,

Read more

Testing Apache Hadoop

Categories: General Hadoop

As a developer coming to Apache Hadoop it is important to understand how testing is organized in the project. For the most part it is simple — it’s really just a lot of JUnit tests — but there are some aspects that are not so well known.

Running Hadoop Unit Tests

Let’s have a look at some of the tests in Hadoop Core, and see how to run them. First check out the Hadoop Core source,

Read more