Multi-host SecondaryNameNode Configuration

Categories: Hadoop HDFS

You might think that the SecondaryNameNode is a hot backup daemon for the NameNode. You’d be wrong. The SecondaryNameNode is a poorly understood component of the HDFS architecture, but one which provides the important function of lowering NameNode restart time. This blog post describes how to configure this daemon in a large-scale environment. The default Hadoop configuration places an instance of the SecondaryNameNode on the same node as the NameNode. A more scalable configuration involves configuring the SecondaryNameNode on a different machine.

Read more

Securing an Apache Hadoop Cluster Through a Gateway

Categories: General Hadoop

(Added 6/4/2013) Please note the instructions below are deprecated. Please refer to the CDH4 Security Guide for up-to-date procedures.

A few weeks ago we ran an Apache Hadoop hackathon. ApacheCon participants were invited to use our 10-node Hadoop cluster to explore Hadoop and play with some datasets that we had loaded on in advance. One challenge we had to face was, how do we do this in a secure way?

Read more