Cluster admins will love the new cluster utilization reporting available in Cloudera Manager 5.7.
Enterprise data hub clusters often are shared by several teams. In such multi-tenant environments, cluster administrators are required to ensure that resources are shared fairly so that one tenant cannot run jobs that starve others. To give better visibility into resource consumption in multi-tenant environments, Cloudera Manager 5.7 (in Cloudera Enterprise Flex and Data Hub Editions) has a new feature for reporting cluster utilization that provides information about overall cluster usage, as well as how YARN applications (including MapReduce and Apache Spark jobs) and Apache Impala (incubating) queries are consuming resources.
Cloudera Manager already provides a convenient user interface to configure YARN Dynamic Resource Pools, and starting with 5.7, Cloudera Manager has also introduced separate configuration for Impala Admission Control Pools. However, it can be challenging to configure the pools to match actual resource-usage patterns; consumption patterns may change across different pools over time, and the admin must ensure that critical jobs continue to meet their SLAs. Cluster utilization reporting provides guidance about how to plan for resources based on average, as well peak, consumption.
In this post, I’ll walk you through the different parts of the reporting and describe how it can help admin understand resource consumption in their cluster.
Figure 1: Overview
The Overview page has two sections:
- The top section shows the overall resource consumption by all running processes and user activities on the cluster. The report has average, maximum, and “average daily peak” (computed by taking the average of daily maximum values) utilization numbers.
- The bottom section reports what fraction of the resources were consumed by the YARN applications and Impala queries that were running on the cluster. It also shows the per-pool (or user) view of resource consumption.
The YARN report has three tabs that provide information about different aspects of configuring YARN resource pools:
Figure 2: YARN Utilization
This tab shows utilization by tenant similar to the Overview page, but only for YARN applications. The page also shows the amount of resources that were allocated to the applications, but not used by them, leading to resource wastage. To prevent wastage, application developers should be advised to reduce the resources that they are requesting while submitting the applications. To find out the specific applications that are wasting resources, you can go to the YARN applications page in Cloudera Manager and search for the applications as shown in Figure 3 by adding the filter for unused_memory_seconds or unused_vcore_seconds.
Figure 3: Finding applications that waste resources
Capacity Planning Tab
Figure 4: YARN Capacity Planning
The Capacity Planning tab for YARN contains information that is useful for determining if the available resources are sufficient for your workloads. It shows a new metric called Wait Ratio During Contention, which is the fraction of containers in a pool that are in pending state, meaning they are waiting in line to get resources. Typically, you’d want to have very low wait ratio for pools that are running critical jobs. To reduce the wait ratio for a pool, you can increase the resource allocated to the pool. If that is not possible (for example, because a lot of pools have high wait ratio) you should add more worker nodes running YARN applications.
Preemption Tuning Tab
Figure 5: YARN Preemption Tuning
Preemption allows YARN to evict containers in the pool that are over-consuming resources to let applications in other pools make progress. The report shows the average resources that were allocated to the pool during times of contention (defined as times when there was at least one pending container in the pool) along with the steady and instantaneous fair share for that pool. If the allocated resources are less than the steady fair share during contention, you should make preemption settings more aggressive so that applications in that pool can get their fair share more quickly. More information about configuring preemption can be found in documentation for Enabling and Disabling Fair Scheduler Preemption.
Impala reports help you tune Admission Control pool settings to make user queries as efficient as possible. It consists of three tabs: Queries, Peak Memory Usage, and Spilled Memory.
Figure 6: Impala Queries
The Queries tab shows the percentage of queries that were successful, and percentage of queries that failed, due to Admission Control. More specifically, the report contains the number of queries that failed due to insufficient memory being available to execute the query, or because too many queries were submitted to the pool at the same time. If important queries are failing because of these reasons, increase the limits configured for these pools.
Peak Memory Usage Tab
Figure 7: Impala Peak Memory Usage
The Peak Memory Usage tab helps you with capacity planning for Impala by showing the peak consumption of memory in the reporting window. The report has overall, as well as per-tenant, peak usage numbers. If peak utilization gets close to the available resources in the cluster, you should increase the number of worker nodes running Impala Daemons.
Spilled Memory Tab
Figure 8: Impala Spilled Memory
The Spilled Memory tab in Impala report shows the amount of memory spilled per hour in different pools. Spilling to disk can severely deteriorate the performance of Impala queries. So if you see pools that are running important queries having large disk spills, then you should mitigate that by following the instructions given in the documentation about SQL Operations that Spill to Disk.
Resource management in a multi-tenant environment is a challenging task for administrators. Cluster utilization reports provide detailed guidance about usage patterns and let administrators take into account the most important factors while setting pool configurations.
Vikram Srivastava is a Software Engineer at Cloudera.