Our recent blog discussed the four paths to get from legacy platforms to CDP Private Cloud Base. In this blog and accompanying video, we will deep dive into the mechanics of running an in-place upgrade from CDH5 or CDH6 to CDP Private Cloud Base. The overall upgrade follows a seven-step process illustrated below.
In the video below we walk through a complete end to end upgrade of CDH to CDP Private Cloud Base.
Step 1: Preparing to Upgrade
Before proceeding with the upgrade it is worth reviewing the prerequisites as specified in the documentation. We’d also recommend performing a full cluster health check which our Professional Services team can help with. Having a good understanding of the current status and health of the cluster will be critical to a successful upgrade.
We recommend installing WXM and capturing a baseline of the current workload performance which will allow us to more accurately evaluate differences before and after the upgrade. Without these baselines, it may be difficult to understand how or why a workload is performing poorly after the upgrade has been completed.
It is also worth checking your application compatibility against the new versions of components in CDP. If you are upgrading from CDH6 you can expect that things will be very similar in terms of versions, whereas there are some bigger version uplifts from CDH5. At the very least you should expect to review any API changes and recompile any applications. In some cases, the swap out of particular legacy components for their new equivalents in CDP may require additional code updates to integrate fully with your operations.
Finally we also recommend that you take a full backup of your cluster, including:
- Zookeeper data
- HDFS Master Node data directories
- Navigator KMS, KTS, and KeyHSM
- Cloudera Manager data
As of CDP Private Cloud Base 7.1.6 we now have full rollback capability for CDH5 and CDH6, however this will require restoring data from the backups above.
Step 2: Pre-Upgrade Transition Steps
- Transition from MR1 to MR2 (CDH5 only)
- Prepare for new collections for Solr (CDH5 only)
- Exporting Sentry policies ready for Apache Ranger
- Migrating Hive 1 or 2 workloads to Hive 3
- HBase pre-upgrade checks (CDH5 and CDH6)
- Replication Manager checks
- Hue dependencies
We recommend that all customers test workloads in a dev or test cluster before upgrading to CDP in production.
Step 3: Upgrading the JDK
CDP supports Open JDK 1.8 and 1.11 and Oracle JDK 1.8. If JDK 1.6 or 1.7 is in use these should be upgraded before upgrading Cloudera Manager. Please note the warnings around specific versions of JDKs in the documentation.
Step 4a: Upgrading the Operating System
CDP supports Red Hat and CentOS 7.6+ and 8.2, Ubuntu 18.04 and 20.04 and SLES 12SP5. If you are running older versions of operating systems, these will also need to be upgraded prior to the cluster upgrade commencing.
Step 4b: Upgrading the RDBMS
CDP supports MariaDB 10.2-10.4, MySQL 5.7 and 8.0, PostgreSQL 10, 11 and 12 and OracleDB 12c, 19c and 19.9.
Step 5: Upgrading Cloudera Manager
Cloudera Manager should also be backed up before an upgrade, which includes the RDBMS and any Cloudera Management Service directories.
The Cloudera Manager Server and Cloudera Manager Agent are updated via your Operating System’s package management system. First, update the configured repository and then run the upgrade commands.
Once Cloudera Manager Server is restarted and the agents are all checking in, you can go ahead and upgrade the Cloudera Management Services via the web UI.
Step 6: Upgrading CDH to CDP Runtime
The first step of the upgrade is to configure CM to see the new parcels and from there you launch the upgrade wizard from the parcels page.
The wizard will guide you through the following steps:
- Resolve Spark2 alternatives priority – for CDH5 only
- Add Tez Service – this is required for Hive 3.
- Add New Solr Service – Ranger requires a dedicated Solr for audit logs.
- Note: This runs on a separate port from other Solr instances running business-focused use cases.
- Add YARN Queue Manager – A user interface for managing YARN queues
- Fair Scheduler to Capacity Scheduler – We provide a fs2cs command line tool for migrating from Fair Scheduler to Capacity Scheduler but recommend that you carefully review and tune the Capacity Scheduler config before and after the upgrade.
- Add Hive on Tez Service –
- Note: The HiveServer2 role is moved to this service and should no longer be accessed under the Hive service within Cloudera Manager.
- Add Ranger Service – Ranger is replacing Sentry and parts of Navigator focused on auditing.
- Install Atlas – Replaces Navigator for Lineage and Cataloging
- Add Kafka Service – Required for Atlas if it’s not already installed
- Add HBase Service – Required for Atlas if it’s not already installed
- Add Atlas Service
- Navigator to Atlas migration
- Set TLS settings – It’s important to ensure that all keystore and truststore settings are configured otherwise services may struggle to connect to Ranger or Atlas as part of the upgrade process.
- Export Sentry permissions –
- This step is now automated as part of CM 7.4.4 and will later be converted to Ranger policies and automatically imported during the Upgrade Wizard process
- Backup Cluster Metadata and Databases for CM, Hive and Oozie
- Run Upgrade
Step 7: Post Upgrade Steps
There are several post-upgrade steps that must be completed after the Upgrade Wizard finishes. These steps will help prepare the system for final testing and validation, and they cover additional configuration and run-time changes to be aware of with your CDP cluster. Review the CDH5 and CDH6 post-upgrade documentation to understand the specific tasks required for coming from each release.
Completion and Finalization
Once the upgrade is complete all services should be up and running. At this point you should perform another health check and ensure that all services are working correctly. You can rebaseline workloads and use WXM to perform a before and after comparison.
Once you are happy with the status of the upgrade you can finalize the HDFS metadata. Important: Until this step has been performed any deleted blocks will not be deleted, meaning that rollback is possible. Do not perform the finalization step until you are absolutely ready! Once you have finalized HDFS, you cannot roll back.
The end-to-end process is relatively straightforward and is mainly wizard driven. Care should be taken to ensure that applications and workloads are tested in lower environments and that any incompatibilities are ironed out before production.
Review the video, above, of an actual cluster upgrade and contact your account team or Cloudera support if you would like to discuss the next steps in your CDP journey.
For additional information on the upgrade process, please see