New in Cloudera Manager 5.7: Cluster Utilization Reporting

Categories: Cloudera Manager Impala Ops and DevOps Performance YARN

Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.

Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage, as well as how YARN applications (including MapReduce and Apache Spark jobs) and Apache Impala (incubating) queries are consuming resources.

Cloudera Manager already provides a convenient user interface to configure YARN Dynamic Resource Pools, and starting with 5.7, Cloudera Manager has also introduced separate configuration for Impala Admission Control Pools. However, it can be challenging to configure the pools to match actual resource-usage patterns; consumption patterns may change across different pools over time, and the admin must ensure that critical jobs continue to meet their SLAs. Cluster utilization reporting provides guidance about how to plan for resources based on average, as well peak, consumption.

In this post, I’ll walk you through the different parts of the reporting and describe how it can help admin understand resource consumption in their cluster.

Overview Page

cluster-util-f1

Figure 1: Overview

The Overview page has two sections:

  • The top section shows the overall resource consumption by all running processes and user activities on the cluster. The report has average, maximum, and “average daily peak” (computed by taking the average of daily maximum values) utilization numbers.
  • The bottom section reports what fraction of the resources were consumed by the YARN applications and Impala queries that were running on the cluster. It also shows the per-pool (or user) view of resource consumption.

YARN Reporting

The YARN report has three tabs that provide information about different aspects of configuring YARN resource pools:

Utilization Tab

cluster-util-f2

Figure 2: YARN Utilization

This tab shows utilization by tenant similar to the Overview page, but only for YARN applications. The page also shows the amount of resources that were allocated to the applications, but not used by them, leading to resource wastage. To prevent wastage, application developers should be advised to reduce the resources that they are requesting while submitting the applications. To find out the specific applications that are wasting resources, you can go to the YARN applications page in Cloudera Manager and search for the applications as shown in Figure 3 by adding the filter for unused_memory_seconds or unused_vcore_seconds.

cluster-util-f3

Figure 3: Finding applications that waste resources

Capacity Planning Tab

cluster-util-f4

Figure 4: YARN Capacity Planning

The Capacity Planning tab for YARN contains information that is useful for determining if the available resources are sufficient for your workloads. It shows a new metric called Wait Ratio During Contention, which is the fraction of containers in a pool that are in pending state, meaning they are waiting in line to get resources. Typically, you’d want to have very low wait ratio for pools that are running critical jobs. To reduce the wait ratio for a pool, you can increase the resource allocated to the pool. If that is not possible (for example, because a lot of pools have high wait ratio) you should add more worker nodes running YARN applications.

Preemption Tuning Tab

cluster-util-f5

Figure 5: YARN Preemption Tuning

Preemption allows YARN to evict containers in the pool that are over-consuming resources to let applications in other pools make progress. The report shows the average resources that were allocated to the pool during times of contention (defined as times when there was at least one pending container in the pool) along with the steady and instantaneous fair share for that pool. If the allocated resources are less than the steady fair share during contention, you should make preemption settings more aggressive so that applications in that pool can get their fair share more quickly. More information about configuring preemption can be found in documentation for Enabling and Disabling Fair Scheduler Preemption.

Impala Reporting

Impala reports help you tune Admission Control pool settings to make user queries as efficient as possible. It consists of three tabs: Queries, Peak Memory Usage, and Spilled Memory.

Queries Tab

cluster-util-f6

Figure 6: Impala Queries

The Queries tab shows the percentage of queries that were successful, and percentage of queries that failed, due to Admission Control. More specifically, the report contains the number of queries that failed due to insufficient memory being available to execute the query, or because too many queries were submitted to the pool at the same time. If important queries are failing because of these reasons, increase the limits configured for these pools.

Peak Memory Usage Tab

cluster-util-f7

Figure 7: Impala Peak Memory Usage

The Peak Memory Usage tab helps you with capacity planning for Impala by showing the peak consumption of memory in the reporting window. The report has overall, as well as per-tenant, peak usage numbers. If peak utilization gets close to the available resources in the cluster, you should increase the number of worker nodes running Impala Daemons.

Spilled Memory Tab

cluster-util-f8

Figure 8: Impala Spilled Memory

The Spilled Memory tab in Impala report shows the amount of memory spilled per hour in different pools. Spilling to disk can severely deteriorate the performance of Impala queries. So if you see pools that are running important queries having large disk spills, then you should mitigate that by following the instructions given in the documentation about SQL Operations that Spill to Disk.

Conclusion

Resource management in a multi-tenant environment is a challenging task for administrators. Cluster utilization reports provide detailed guidance about usage patterns and let administrators take into account the most important factors while setting pool configurations.

You can find more information about how to use this feature in the documentation for Cluster Utilization Reports. You can also check out the videos about this feature for YARN and for Impala.

Vikram Srivastava is a Software Engineer at Cloudera.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

5 responses on “New in Cloudera Manager 5.7: Cluster Utilization Reporting

    1. Buntu

      I’ve upgraded to CDH/CM to v5.7.0 but not able to find the cluster utilization report. Does it need to be enabled anywhere? Thanks!

      1. Justin Kestelyn Post author

        Clarification (since added to post): this feature is available in paid versions of Cloudera Enterprise only (not Express).

  1. Lijju

    We have a paid version of cloudera at the our organization. Currently the version that we are using is 5.5 and would be upgrading to 5.7
    Can we automate this the reporting. I want to see a report sent to team with details for the past week. Can it be done ?

    1. Justin Kestelyn Post author

      This is not currently possible, but it is under consideration as a roadmap item. Thanks for the requirement!