Author Archives: Colin McCabe

How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop

Categories: Hadoop HDFS

One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other way around. Because of this, the Hadoop Distributed File System (HDFS) typically handles many “local reads” reads where the reader is on the same node as the data:

Initially, local reads in HDFS were handled the same way as remote reads: the client connected to the DataNode via a TCP socket and transferred the data via DataTransferProtocol.

Read More

NameNode Recovery Tools for the Hadoop Distributed File System

Categories: HDFS

Warning: The procedure described below can cause data loss. Contact Cloudera Support before attempting it.

Most system administrators have had to deal with a bad hard disk at some point. One moment, the hard disk is a mechanical marvel; the next, it is an expensive paperweight.

The HDFS (Hadoop Distributed File System) community has been steadily working to diminish the impact of disk failures on overall system availability. In this article,

Read More