In part one of this blog post (read here) we introduced CDP Private Cloud, including its five key components. In this second part, we focus on how this new architecture helps achieve agility in the enterprise. Recall that BI analyst that couldn’t get consistent performance due to noisy neighbors, the data scientist that had to wait months to get the latest Spark version, and the application developer that couldn’t get new hardware? Let’s see how CDP Private Cloud tackles these challenges:
1. Simplified Multi-tenancy: our customers tell us that traditional Hadoop-based multi-tenancy and resource management controls have become complex. Let’s take YARN for example. A cluster administrator needs to learn YARN’s implementation of capacity schedulers, priority queues, resource pools, and pre-emption capabilities to maintain order in the cluster. Few people in the world are familiar with these concepts, and even fewer know them well enough to manage a thousand users sharing a cluster. It’s not surprising we see a rise in support cases related to ‘noisy neighbors’, and platform administrators taking extra precautions before onboarding that next tenant. These precautions impact agility and slow how quickly use cases can be onboarded.
With CDP Private Cloud we move away from application-specific tenancy controls to the same tenancy controls used by other applications on Kubernetes. Instead of deploying a massive service shared by hundreds of users with service-specific multi-tenancy controls, the administrators now deploy smaller instances of the service, each one shared by fewer users. Each instance wants cores and memory in a shared Kubernetes cluster, which means the admins can now use the same standard controls that they use to manage every other non-Cloudera application running on Kubernetes.
2. Infrastructure agility: with bare-metal deployments, expanding capacity means buying and racking servers, formatting disks with corporate-approved images, and going through the complex process of adding the machines to the original cluster. IT departments often view the traditional architecture as an ‘appliance’ in the sense that the clusters require isolated islands of infrastructure.
With CDP Private Cloud, we deploy the analytic experiences on an enterprise’s container infrastructure. By running on a shared environment with other applications, we reduce the time and cost associated with deploying and scaling our applications. A cluster expansion that used to take 4-6 months from procurement to nodes online now takes 4-6 minutes.
In addition, unlike the traditional architecture, CDP Private Cloud detaches the compute engines from the physical infrastructure. The admin no longer designates a physical node as being an Impala node versus a Spark node. Instead, the services now see a pool of cores and memory, and the same physical machine that’s processing an Impala query for the BI team may also run a Spark job for a completely different line of business. This flexibility allows administrators to take away capacity from one application and give it to another one in minutes as opposed to weeks.
3. Upgrade Agility: with the traditional cluster architecture, administrators could not easily upgrade just one tenant in the cluster. That’s because each CDH and HDP version shipped as a large, colocated distribution made up of more than 30 interdependent components (based upon a monolithic architecture). Cloudera’s internal development and test processes ensured compatibility within one packaged distribution version. And thus the platform upgrade workflows required upgrading the entire platform and all tenants at the same time. For a large cluster, this operation can end up taking over 6 months of planning.
With a container-based architecture, CDP can independently upgrade individual services. More importantly, admins will be able to offer multiple versions of a specific engine or service running concurrently on the platform, allowing end-users to upgrade their compute environments when it makes sense for them. This type of modularity reduces the complexity of upgrades, allowing administrators to offer the latest tools to end-users in minutes as opposed to months.
4. Self-serve provisioning: with a simpler multi-tenancy scheme, and a more agile infrastructure-provisioning system, administrators can empower end-users and local team administrators to create their own environments whenever they need one. By distributing the provisioning process closer to the end-users, central admins stop being the bottleneck to every incoming request. The result is end-users getting to productivity in minutes as opposed to weeks.
These four dimensions describe how CDP Private Cloud unlocks agility in the organization. But what about security, governance, and centralized controls? We know our customers value these critical aspects of the platform, but many of them find it increasingly difficult to enforce. The intense demand for access to the latest technologies leads to point solutions or ‘shadow IT’ systems that lack oversight and present a security risk to the organization. CDP Private Cloud did not compromise on the security and governance standards that earned us the trust of the largest enterprises. Every compute environment provisioned on CDP Private Cloud gets automatically connected to Ranger and Atlas. That means the environment has fine-grained access controls, audit trails, data lineage, and encryption configured and enforced before an end-user can run the very first query or job.
The recent announcement of CDP Private Cloud marks just the start of an exciting journey ahead for the Cloudera Data Platform. We now have the right foundation to carry our customers through many years of innovation. This foundation relies on principles of modularity, modern infrastructure management, and self-serve workflows that improve agility for the organization.
Interested in having a discussion about CDP Private Cloud? Sign up for this webinar to go into the details.
Good information very helpful
Can you share some more explanation via any Video please ?