Category Archives: YARN

Untangling Apache Hadoop YARN, Part 5: Using FairScheduler queue properties

Categories: Hadoop YARN

Previously in Part 4, we described the most commonly used FairScheduler properties in Apache Hadoop.  In Part 5, we’ll provide some examples to show how properties can be used, individually and in combination, to achieve commonly desired behavior such as application prioritization and organizing queues.

Example: Best Effort Queue

Summary: Create a “best effort” queue that runs applications when the cluster is underutilized.  

Implementation: In FairScheduler,

Read More

Untangling Apache Hadoop YARN, Part 4: Fair Scheduler Queue Basics

Categories: Hadoop YARN

In this installment, we provide insight into how the Fair Scheduler works, and why it works the way it does.

In Part 3 of this series, you got a quick introduction to Fair Scheduler, one of the scheduler choices in Apache Hadoop YARN (and the one recommended by Cloudera). In Part 4, we will cover most of the queue properties, some examples of their use, as well as their limitations.

Read More

New in Cloudera Manager 5.7: Cluster Utilization Reporting

Categories: Cloudera Manager Impala Ops and DevOps Performance YARN

Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.

Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage,

Read More

Untangling Apache Hadoop YARN, Part 3: Scheduler Concepts

Categories: YARN

In Parts 1 and 2, we covered the basics of YARN resource allocation. In this installment, we’ll provide an overview of cluster scheduling and introduce the Fair Scheduler, one of the scheduler choices available in YARN.

A standalone computer can have several CPU cores, each running a single process, but there can be as many as a few hundred processes running simultaneously. The scheduler is a part of the desktop’s operating system that assigns a process to a CPU core to run for a short period of time.

Read More

Untangling Apache Hadoop YARN, Part 2: Global Configuration Basics

Categories: YARN

A new installment in the series about the tangled ball of thread that is YARN

In Part 1 of this series, we covered the fundamentals of clusters of YARN. In Part 2, you’ll learn about other components than can run on a cluster and how they affect YARN cluster configuration.

Ideal YARN Allocation

As shown in the previous post, a YARN cluster can be configured to use up all the resources on the cluster.

Read More