Andrew Wang, Author at Cloudera Blog

September 23, 2015 | Technical

Introduction to HDFS Erasure Coding in Apache Hadoop

Erasure coding, a new feature in HDFS, can reduce storage overhead by approximately 50% compared to replication while maintaining the same durability guarantees. This post explains how it works. HDFS by default replicates each block three times. Replication provides a simple and robust form of redundancy to shield against most failure scenarios. It also eases […]

by Andrew Wang 13 min read

Apache Hadoop Apache HDFS

March 5, 2014 | Technical

A Guide to Checkpointing in Hadoop

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one. Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source […]

by Andrew Wang 8 min read

Apache Hadoop Apache HDFS Ops and DevOps

More by this author:

Introduction to HDFS Erasure Coding in Apache Hadoop

A Guide to Checkpointing in Hadoop