Category Archives: YARN

Untangling Apache Hadoop YARN, Part 3: Scheduler Concepts

Categories: YARN

In Parts 1 and 2, we covered the basics of YARN resource allocation. In this installment, we’ll provide an overview of cluster scheduling and introduce the Fair Scheduler, one of the scheduler choices available in YARN.

A standalone computer can have several CPU cores, each running a single process, but there can be as many as a few hundred processes running simultaneously. The scheduler is a part of the desktop’s operating system that assigns a process to a CPU core to run for a short period of time.

Read More

Untangling Apache Hadoop YARN, Part 2: Global Configuration Basics

Categories: YARN

A new installment in the series about the tangled ball of thread that is YARN

In Part 1 of this series, we covered the fundamentals of clusters of YARN. In Part 2, you’ll learn about other components than can run on a cluster and how they affect YARN cluster configuration.

Ideal YARN Allocation

As shown in the previous post, a YARN cluster can be configured to use up all the resources on the cluster.

Read More

Untangling Apache Hadoop YARN, Part 1: Cluster and YARN Basics

Categories: Hadoop MapReduce YARN

In this multipart series, fully explore the tangled ball of thread that is YARN.

YARN (Yet Another Resource Negotiator) is the resource management layer for the Apache Hadoop ecosystem. YARN has been available for several releases, but many users still have fundamental questions about what YARN is, what it’s for, and how it works. This new series of blog posts is designed with the following goals in mind:

  • Provide a basic understanding of the components that make up YARN
  • Illustrate how a MapReduce job fits into the YARN model of computation.

Read More

How-to: Secure YARN Containers with Cloudera Navigator Encrypt

Categories: Cloudera Navigator Security YARN

Learn how Cloudera Navigator Encrypt bring data security to YARN containers.

With the introduction of transparent data encryption in HDFS, we are now a big step closer toward a secure platform in the Apache Hadoop world. However, there are still gaps in the platform, including how YARN and its applications manage their cache. In this post, I’ll explain how Cloudera Navigator Encrypt fills that particular gap.

Use Case

When a YARN application runs in a cluster it can sometimes spill data to the hard disk,

Read More

Apache Spark Resource Management and YARN App Models

Categories: Spark YARN

A concise look at the differences between how Spark and MapReduce manage cluster resources under YARN

The most popular Apache YARN application after MapReduce itself is Apache Spark. At Cloudera, we have worked hard to stabilize Spark-on-YARN (SPARK-1101), and CDH 5.0.0 added support for Spark on YARN clusters.

In this post, you’ll learn about the differences between the Spark and MapReduce architectures, why you should care,

Read More