Noisy Neighbors in Large, Multi-Tenant Clusters
The typical Cloudera Enterprise Data Hub cluster starts off with a few dozen nodes in the customer’s datacenter. Once configured and secured, the cluster administrator (admin) gives access to a few individuals to onboard their workloads. Over time, workloads start processing more data, tenants start onboarding more workloads, and the admin starts onboarding more tenants. To address the increased demand, the admin adds more hardware capacity to the shared cluster.
However, not all scalability challenges can be efficiently addressed by simply adding more nodes. One of these challenges is the noisy neighbor problem, i.e., the undesirable effect of one user’s workload impacting the performance and stability of another user’s workload.
As the user and workload count increases in the shared cluster, admins become increasingly sophisticated in leveraging various resource management and multi-tenancy features that we offer within a cluster to maintain order and provide some isolation. These include setting up YARN resource pools, managing priorities, configuring pre-emption, and dedicating nodes to specific services, or scheduling certain workloads on specialized hardware (e.g. GPUs).
Despite of these levers, once clusters hit hundreds or thousands of users and workloads, admins may not be able to effectively control the ensuing chaos. For example, a less experienced user might submit an Impala SQL query that attempts to join many tables with no predicates that limit the size of the data scans. Another user on a completely different team suddenly receives a message that a query, which had been running daily for months without problems, now times out. This seemingly inexplicable failure causes frustration. If the impacted user is not changing anything in the environment from their perspective, then they expect predictability in both the performance and the stability of workloads. When we add a mix of batch, interactive, and data serving workloads to this large cluster, the problem can become intractable.
At a certain point, the admin may decide to split up the large, multi-tenant cluster into several smaller clusters with fewer tenants each to improve isolation. While this approach may offer some relief, it creates another significant challenge: duplication of data, metadata, and security policies, or a ‘split-brain’ data lake. Now the admins need to synchronize multiple copies of the data and metadata and ensure that users across the many clusters are not viewing stale information. Furthermore, the admins would need to replicate all the security policies and ensure that changes are applied to all clusters. This duplication can become such a management nightmare that they may prefer to return to the original model.
A Better Approach: Virtual Private Clusters
Cloudera Manager (CM) 6.2 introduces “Virtual Private Clusters” (VPCs) to improve isolation without the downsides of data duplication. With CM 6.2, a cluster admin can create multiple, isolated, compute-only clusters that each point to one data repository, one data catalog, and one set of security policies.
To achieve this, Cloudera VPCs rely on the logical separation of compute services from base services. More specifically, distributed MapReduce, Hive, Impala, and Spark workloads run on the compute clusters, while HDFS, the Hive Metastore, and the Sentry service run on the base cluster. Multiple compute clusters can access the services in a base cluster through a new concept called the Data Context. A Data Context is simply a grouping of pointers to the base cluster services, with an easily recognizable name given by the cluster admin.
Beginning with CM 6.2, when an admin accesses the cluster creation wizard, CM presents two options: (1) create a traditional cluster, or (2) create a compute cluster. For a new compute cluster, CM will ask for the Data Context in a drop-down menu. This Data Context will provide all the necessary configuration to the compute cluster to access the shared data, metadata, and data authorization policies resident in the base cluster. This also allows administrators of compute clusters to operate independently, unaware of the details of the base cluster. Similarly, through the Data Context, base-cluster administrators can see which compute clusters are dependent on any Data Context and thus might be impacted by a maintenance window or configuration change.
Virtual Private Clusters
With VPCs, cluster admins now can provide stronger isolation guarantees between tenants and workloads. A query running in Compute Cluster 1 does not share any compute resources with a query, ETL job or streaming job running in Compute Cluster 2. Furthermore, both Compute Cluster 1 and Compute Cluster 2 can operate on the same copy of the data with a single set of security policies. This model increases predictability without creating undesirable data silos.
Multi-tenancy Strategies with Virtual Private Clusters
Prior to using Virtual Private Clusters, cluster admins need to decide on an isolation strategy for the compute-only clusters. Here are three possible strategies they may consider:
1) By user, team, or business unit
Perhaps the most obvious way of dividing up clusters relies on assigning individuals, teams, or business units to their own dedicated compute clusters. In other words, tenants map to human beings. This strategy works well for managing internal chargebacks, limiting the impact of less sophisticated users on more experienced users, and overall encouraging individuals to think about and optimize their jobs and queries now that they have a smaller (but dedicated) cluster.
2) By workload type
Splitting compute clusters by type of workload is another good strategy. This approach seeks to optimize resource utilization or infrastructure efficiency. One cluster may run on memory-optimized VMs for maximum performance of Impala SQL queries, while another cluster may run on CPU-optimized VMs for compute-intensive Spark jobs. In this case, an individual with different types of workloads may have access to multiple clusters; the admin trades increased risk of a noisy neighbor for better infrastructure efficiency.
3) By workload priority
A third strategy splits clusters based on the overall priority of the workloads running on those clusters. At one end of the spectrum, the admin may designate a few clusters as “mission critical”; these clusters are typically over-provisioned and provide strong SLA guarantees. At the other end of the spectrum, the admin may instantiate a number of low-priority dev clusters – these clusters may often run at capacity, not require performance guarantees, but also provide more agility and flexibility for experimentation.
While we present the above as independent strategies, admins can combine multiple strategies to achieve the optimal balance for their specific environments. For example, an admin may instantiate a compute-only cluster optimized for SQL analytics that is dedicated to dev/test queries coming from the finance team; this approach incorporates all 3 strategies.
In general, the more granular the admin approaches isolating users and workloads in dedicated compute-only clusters, the less likely users will impact each other. However, greater isolation diminishes the possibilities for running workloads with spare capacity, so infrastructure utilization tends to suffer. Virtual Private Clusters effectively gives administrators a knob to select the tradeoff that works best for them.
In a traditional cluster, Cloudera Manager co-locates compute and storage on the same physical infrastructure. This model allows the resource manager to schedule compute tasks on the same physical nodes that hold the data those tasks will process. In the VPC model, compute nodes are isolated from other compute nodes to provide strong isolation. Likewise, since we have one copy of the data and metadata to avoid silos, compute-to-compute isolation also necessitates separation of compute from storage.
Therefore, in the VPC model, every read and write operation from the compute-only cluster traverses the network. This loss of data locality means IT and cluster administrators should pay close attention to network capacity to avoid significant performance degradations. As the number of nodes on the compute side increases, so does the overall traffic across the network.
Designing a proper network for VPCs extends beyond the scope of this blog post. Nonetheless, we recommend that customers adopt flat spine-leaf topologies with zero-to-low oversubscription ratios across any point in the network. While 100gbs NICs are always preferable, oversubscription and cross-sectional bandwidth are much more important metrics to focus on when it comes to highly distributed, data-intensive workloads going across the network. Cloudera Manager 6.2 also includes tools for measuring network latency and cross-sectional bandwidth between compute clusters and base clusters to ensure our customers have an appropriate networking environment for using VPCs.
We realize most customers will not want to take a large multi-tenant cluster with data locality and pull all the compute workloads into remote clusters overnight (leaving the compute capacity in existing nodes unused). For this reason, Virtual Private Clusters allows admins to run both models (local and remote storage) in the same environment. We achieve this by allowing the base cluster that holds the data, metadata, and security policies to also run compute workloads on the same nodes. In other words, the base cluster acts just like any other traditional cluster with data locality. This allows the admin to keep I/O intensive and performance sensitive workloads running on the base cluster, while other workloads that require greater isolation guarantees and may have less sensitivity to I/O performance can be deployed in compute-only clusters.
Conclusion and future work
Virtual Private Clusters can significantly improve multi-tenancy management and isolation guarantees in large, shared environments, allowing users of big data infrastructure to onboard more users and workloads faster, with less risk. Rather than becoming experts on intra-cluster multi-tenancy and resource management, admins can rely on infrastructure-level isolation to separate tenants. This approach is also more in line with cloud-based deployments where tenants tend to get their own dedicated compute environment (servers, virtual machines, or containers) to run their jobs and queries on shared data residing in a centralized storage service.
While VPCs brings much needed isolation capabilities to our customers, the new Cloudera Data Platform coming soon will take this concept to a whole new level. Many of the seemingly unavoidable tradeoffs mentioned above will disappear as we adopt new infrastructure technologies providing greater efficiency and elasticity. We look forward to sharing more information about these new capabilities in the near future.
Thank you to Lakshmi Randall, Jayesh Seshadri, Christopher Davis, and Rahul Buddhisagar for reviewing and contributing your knowledge and expertise to this article.