The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse, Machine Learning, Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment.
In CDP Private Cloud, the introduction of Cloudera Data Warehouse and Cloudera Machine Learning Experiences on RedHat OpenShift Kubernetes clusters means that we can deploy new workloads on an existing Base cluster in under an hour, but the installation of CDP Private Cloud Base clusters has lagged behind.
Automation for CDP Private Cloud
Today we are launching the public release of Ansible-based automation for the deployment of CDP Private Cloud Base clusters which can be installed on bare metal servers or virtual machines in the data center or in the public cloud. We’ve been trialling this internally and with customers for a number of months and have proven its ability to operate in some of the most complex customer environments that we have.
Cloudera consultants and customers alike have been manually installing clusters for many years and the list of tasks can be complicated and prone to typos/mis-configuration, particularly on large clusters (OS prerequisites, package and parcel repositories, supporting databases, key, certificate and truststore management, kerberos configuration, service layout and configuration, audit configuration, post installation, etc.). By automating these tasks we can ensure that we can be much more prescriptive about how clusters are built, improve build quality and consistency, and free up consultants and administrators to focus on value-add tasks rather than repetitive installations. The declarative definition encourages knowledge sharing and configuration parity between environments.
“The bottom line is that automation lowers the risk of human error and adds some intelligence to the enterprise system.” – Stephen Elliot (IDC)
We are releasing the Ansible playbooks as part of Cloudera Labs under the Apache Software License v2 and inviting customers and partners to collaborate by submitting an ICLA or CCLA as appropriate.
There are two versions of the playbooks that we’re developing. Version 2 is an end-to-end playbook for installing bare-metal clusters. Version 2 will not be undergoing any active development other than bug fixes. Version 3 is rearchitected as a set of composable roles that can be installed as an Ansible Collection via Ansible Galaxy. The Cloudera Deploy project then has a sample playbook that performs an end-to-end installation, but can integrate with bare-metal and public or private virtual environments, all from simple composable declarative definitions.
Next Steps
Please review the documentation and how-to guides and try the playbooks out for yourself. If you’d like to get involved please raise issues on the GitHub project and we welcome pull requests from members of the community. If you’d like assistance with your CDP Upgrade or Migration project, or using the automation, please contact your account team.
Resources:
Version 2 Getting Started Guide: https://github.com/cloudera-labs/cloudera.cluster/blob/v2.0.0/docs/getting-started.md
Version 2 GitHub project: https://github.com/cloudera-labs/cloudera.cluster/tree/v2.0.0
Version 3 Getting Started Guide: https://github.com/cloudera-labs/cloudera-deploy#readme
Version 3 GitHub project: https://github.com/cloudera-labs/cloudera.cluster/ and https://github.com/cloudera-labs/cloudera-deploy/
Note:
The Ansible playbooks are provided on an as-is basis without any warranty or support. The playbooks do however use supported APIs of Cloudera Manager and CDP and therefore support will be available where issues arise with the use of those products.
Acknowledgements:
The Ansible playbooks have been developed by a number of people across Cloudera. Thanks go to: David Beech, Webster Mudge, Mac Moore, Jim Halfpenny, Sai Krishna Kalyan, Dima Fadeyev, Chris Teoh, Matthew Weis, Denis Coady, Luciano Sorrentino, Chris Jacques, Venkata Udamala,, Vijay Anand Karthikeyan, Michael O’Kane and François Frisch for their contributions so far.
Does it require connection to the internet or can it be used with the images downloaded on premises?
So long as you are able to download all of the dependencies from Ansible Galaxy then you can definitely run this without an internet connection and in fact we have a customer doing this in production already.