Struggling to Manage your Multi-Tenant Environments? Use Chargeback!

If your organization is using multi-tenant big data clusters (and everyone should be), do you know the usage and cost efficiency of resources in the cluster by tenants? A chargeback or showback model allows IT to determine costs and resource usage by the actual analytic users in the multi-tenant cluster, instead of attributing those to the platform (“overhead’) or IT department. This allows you to know the individual costs per tenant and set limits in order to control overall costs.

Big data clusters have become a necessity in a modern business place. Typically such clusters are multi-tenant and are shared by different lines of business, and groups. Planning and setting up multi-tenant big data clusters involve several considerations, the most important being  the need to understand the usage and costs of various resources in the cluster by tenants to optimize cluster costs in public cloud or even on premises.

For administrators and platform owners, the ability to understand infrastructure usage and cost by tenant involves several aspects. For example:

  1. As an administrator, how can I have visibility into the actual usage of the available resources by users and effectively plan and provide my resources accordingly.? 
  2. To allow effective use of resources, how do I make users aware of the costs associated with the resources? 
  3. How to attribute the costs only for the actual resource consumption rather than charging all departments for all resources?
  4. How to charge different rates based on kinds of resources such as CPU, memory, network and so on?

This blog post covers the chargeback reporting feature in Workload Manager.  This capability helps you address frequently-asked questions like the ones listed above.

Chargeback in Workload Manager

With the above use cases in mind, we added support for chargeback reports in Workload Manager (WXM).

For an IT department or system administration group that provides services to internal users,  the ability to check cluster usage by different departments and view chargeback reports based on the actual usage of the cluster is a useful feature. In large organizations, where there might be hundreds of users with tens of departments, use of WXM chargeback reports allows IT departments to charge ( or notionally showback) the end users of the big data clusters for the resources they consumed. 

Generating chargeback reports in Workload Manager

Using the chargeback feature, administrators can decide which clusters to track charges.  For example,  an organization may have a few high end clusters that run in production and should track charges. On the other hand, there could be some dev and QA clusters that need not be charged.  The administrators decide the charges (rates) for CPU and memory consumption by type of workloads that run on the clusters to be charged.  Resource managers such as Yarn divide the available CPU and memory resources between the jobs that run on the clusters.  In WXM, chargeback reporting considers CPU and memory usage using CPU core hours and  Memory GB hours respectively.  These units indicate the CPU and memory usage over the period of time they were allocated to the job being executed.

Define chargeback settings

WXM provides administrators the ability to define cost centers based on criteria such as user or pool. Using this criteria, users can then be grouped together for the purpose of chargeback. For example, members of a particular department such as sales can be added to a ‘sales’ cost center and so on.

Create cost centers

Once the cost centers are available, users can view chargeback reports for those cost centers for a specific time period of their interest. These reports provide insights into resource consumption for all the jobs belonging to the cost center along with costs for each of those jobs. Users can further drill down and check top N costly jobs, costs by engine types, costs by scheduler pools and so on.

Cost centers list

Resource consumption by clusters for a cost center

Cost center usage drill down

WXM also helps users to check detailed analysis of each job to answer the questions like- why the job execution took so much time? Does this job always run this slow (compared with the baseline execution numbers of the job by looking at historical runs of it) and so on.  WXM will identify health issues and will give possible recommendations to improve the jobs performance that can ultimately result in cost optimization. For example:

Analysis of a Spark Job

 

Health check recommendations

The above analysis would ultimately help the users to optimize the jobs and resources consumed by them. Check out the documentation for exact steps to use the chargeback feature to optimize your costs.

Summary

Using Workload Manager’s chargeback feature, administrators can start implementing the chargeback process for their multi tenant CDP clusters. Further, big data developers and administrators can use Workload Manager’s capabilities to spot health issues and make use of the provided recommendations to improve job execution in terms of execution time as well as efficient use of resources. In future, we intend to make shared resources such as network, data usage and so on available in the chargeback feature.

Shirish Deshmukh
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.