Welcome to the first guest post on the Cloudera blog. The other day, we saw Toby from Swingly tweeting about using Apache Hadoop to process millions of other tweeters’ tweets. We were curious, and Toby put together a great writeup about how they use Hadoop to crunch data. We have a few other guest posts in the pipeline, but if you are doing something really fun with Hadoop and want to share,
You might think that the SecondaryNameNode is a hot backup daemon for the NameNode. You’d be wrong. The SecondaryNameNode is a poorly understood component of the HDFS architecture, but one which provides the important function of lowering NameNode restart time. This blog post describes how to configure this daemon in a large-scale environment. The default Hadoop configuration places an instance of the SecondaryNameNode on the same node as the NameNode. A more scalable configuration involves configuring the SecondaryNameNode on a different machine.