MapsReduce Archives - Cloudera Blog

June 10, 2019 | Technical

HDFS Erasure Coding in Production

HDFS erasure coding (EC), a major feature delivered in Apache Hadoop 3.0, is also available in CDH 6.1 for use in certain applications like Spark, Hive, and MapReduce. The development of EC has been a long collaborative effort across the wider Hadoop community. Including EC with CDH 6.1 helps customers adopt this new feature by […]

by Cloudera , Xiao Chen , Sammi Chen , Jian Zhang 15 min read

September 4, 2015 | Technical

Untangling Apache Hadoop YARN, Part 1: Cluster and YARN Basics

In this multipart series, fully explore the tangled ball of thread that is YARN. YARN (Yet Another Resource Negotiator) is the resource management layer for the Apache Hadoop ecosystem. YARN has been available for several releases, but many users still have fundamental questions about what YARN is, what it’s for, and how it works. This […]

by Cloudera , Dennis Dawson 6 min read

Apache Hadoop Apache Yarn MapsReduce

June 19, 2013 | Technical

Introduction to Apache HBase Snapshots, Part 2: Deeper Dive

In Part 1 of this series about Apache HBase snapshots, you learned how to use the new Snapshots feature and a bit of theory behind the implementation. Now, it’s time to dive into the technical details a bit more deeply. What is a Table? An HBase table comprises a set of metadata information and a set […]

by Cloudera 5 min read

Apache HBase Apache ZooKeeper MapsReduce

January 11, 2011 | Technical

How-to: Include Third-Party Libraries in Your MapReduce Job

“My library is in the classpath but I still get a Class Not Found exception in a MapReduce job” – If you have this problem this blog is for you. Java requires third-party and user-defined classes to be on the command line’s “–classpath” option when the JVM is launched. The `hadoop` wrapper shell script does […]

by Cloudera 3 min read

Apache Hadoop Apache HBase MapsReduce

February 2, 2009 | Technical

The Small Files Problem

Small files are a big problem in Hadoop — or, at least, they are if the number of questions on the user list on this topic is anything to go by. In this post I’ll look at the problem, and examine some common solutions. Problems with small files and HDFS A small file is one […]

by Szele Balint 4 min read

Apache Hadoop Apache HBase Apache HDFS MapsReduce

Filter By

HDFS Erasure Coding in Production

Untangling Apache Hadoop YARN, Part 1: Cluster and YARN Basics

Introduction to Apache HBase Snapshots, Part 2: Deeper Dive

How-to: Include Third-Party Libraries in Your MapReduce Job

The Small Files Problem