Tag Archives: system administrators

5 Pitfalls of Benchmarking Big Data Systems

Categories: Hadoop Performance

Benchmarking Big Data systems is nontrivial. Avoid these traps!

Here at Cloudera, we know how hard it is to get reliable performance benchmarking results. Benchmarking matters because one of the defining characteristics of Big Data systems is the ability to process large datasets faster. “How large” and “how fast” drive technology choices, purchasing decisions, and cluster operations. Even with the best intentions, performance benchmarking is fraught with pitfalls—easy to get numbers,

Read more

How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop

Categories: Hadoop HDFS

One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other way around. Because of this, the Hadoop Distributed File System (HDFS) typically handles many “local reads” reads where the reader is on the same node as the data:

Initially, local reads in HDFS were handled the same way as remote reads: the client connected to the DataNode via a TCP socket and transferred the data via DataTransferProtocol.

Read more

Get a Free Hadoop Operations Ebook with Administrator Training

Categories: Books General Hadoop Training

Start the year off with bigger questions by taking advantage of Cloudera University’s special offer for aspiring Hadoop administrators. All participants who complete a Cloudera Administrator Training for Apache Hadoop public course by the end of March 2013 will receive a free digital copy of Hadoop Operations by Eric Sammer. If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must.

Exploring Compression for Hadoop: One DBA’s Story

Categories: General Guest HDFS Ops and DevOps

This guest post comes to us courtesy of Gwen Shapira (@gwenshap), a database consultant for The Pythian Group (and an Oracle ACE Director).

Most western countries use street names and numbers to navigate inside cities. But in Japan, where I live now, very few streets have them.

Sometimes solving technical problems is similar to navigating a city without many street names: Once you arrive at the desired location,

Read more

NameNode Recovery Tools for the Hadoop Distributed File System

Categories: HDFS

Warning: The procedure described below can cause data loss. Contact Cloudera Support before attempting it.

Most system administrators have had to deal with a bad hard disk at some point. One moment, the hard disk is a mechanical marvel; the next, it is an expensive paperweight.

The HDFS (Hadoop Distributed File System) community has been steadily working to diminish the impact of disk failures on overall system availability. In this article,

Read more