You might think that the SecondaryNameNode is a hot backup daemon for the NameNode. You’d be wrong. The SecondaryNameNode is a poorly understood component of the HDFS architecture, but one which provides the important function of lowering NameNode restart time. This blog post describes how to configure this daemon in a large-scale environment. The default Hadoop configuration places an instance of the SecondaryNameNode on the same node as the NameNode. A more scalable configuration involves configuring the SecondaryNameNode on a different machine.
Small files are a big problem in Hadoop — or, at least, they are if the number of questions on the user list on this topic is anything to go by. In this post I’ll look at the problem, and examine some common solutions.
Problems with small files and HDFS
A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files,
We’ve been talking to enterprise users of Hadoop about existing and new projects, and lots of them are asking questions about reliability and data integrity. So we wrote up a short paper entitled HDFS Reliability to summarize the state of the art and provide advice. We’d like to get your feedback, too, so please leave a comment.
As a developer coming to Apache Hadoop it is important to understand how testing is organized in the project. For the most part it is simple — it’s really just a lot of JUnit tests — but there are some aspects that are not so well known.
Running Hadoop Unit Tests
Let’s have a look at some of the tests in Hadoop Core, and see how to run them. First check out the Hadoop Core source,
(Added 6/4/2013) Please note the instructions below are deprecated. Please refer to the CDH4 Security Guide for up-to-date procedures.
A few weeks ago we ran an Apache Hadoop hackathon. ApacheCon participants were invited to use our 10-node Hadoop cluster to explore Hadoop and play with some datasets that we had loaded on in advance. One challenge we had to face was, how do we do this in a secure way?