Upgrade to CDP Private Cloud Base – A Step by Step Guide

Upgrade to CDP Private Cloud Base – A Step by Step Guide

Our recent blog discussed the four paths to get from legacy platforms to CDP Private Cloud Base. In this blog and accompanying video, we will deep dive into the mechanics of running an in-place upgrade from CDH5 or CDH6 to CDP Private Cloud Base. The overall upgrade follows a seven-step process illustrated below.

In the video below we walk through a complete end to end upgrade of CDH to CDP Private Cloud Base.

Step 1: Preparing to Upgrade

Before proceeding with the upgrade it is worth reviewing the prerequisites as specified in the documentation. We’d also recommend performing a full cluster health check which our Professional Services team can help with. Having a good understanding of the current status and health of the cluster will be critical to a successful upgrade.

Cloudera Support also makes available a set of validations which run against diagnostic data and these should also be reviewed.

We recommend installing WXM and capturing a baseline of the current workload performance which will allow us to more accurately evaluate differences before and after the upgrade. Without these baselines, it may be difficult to understand how or why a workload is performing poorly after the upgrade has been completed.

It is also worth checking your application compatibility against the new versions of components in CDP. If you are upgrading from CDH6 you can expect that things will be very similar in terms of versions, whereas there are some bigger version uplifts from CDH5. At the very least you should expect to review any API changes and recompile any applications. In some cases, the swap out of particular legacy components for their new equivalents in CDP may require additional code updates to integrate fully with your operations.

Finally we also recommend that you take a full backup of your cluster, including:

  • RDBMS
  • Zookeeper data
  • HDFS Master Node data directories
  • Navigator KMS, KTS, and KeyHSM
  • Cloudera Manager data

Full details are available for CDH5 and CDH6.

As of CDP Private Cloud Base 7.1.6 we now have full rollback capability for CDH5 and CDH6, however this will require restoring data from the backups above.

Step 2: Pre-Upgrade Transition Steps

Instruction details differ for CDH5 and CDH6 but the basics are the same. We will need to prepare for any component changes in CDP, including:

  • Transition from MR1 to MR2 (CDH5 only)
  • Prepare for new collections for Solr (CDH5 only)
  • Exporting Sentry policies ready for Apache Ranger
  • Migrating Hive 1 or 2 workloads to Hive 3
  • HBase pre-upgrade checks (CDH5 and CDH6)
  • Replication Manager checks
  • Hue dependencies

We recommend that all customers test workloads in a dev or test cluster before upgrading to CDP in production.

Step 3: Upgrading the JDK

CDP supports Open JDK 1.8 and 1.11 and Oracle JDK 1.8. If JDK 1.6 or 1.7 is in use these should be upgraded before upgrading Cloudera Manager. Please note the warnings around specific versions of JDKs in the documentation.

Step 4a: Upgrading the Operating System

CDP supports Red Hat and CentOS 7.6+ and 8.2, Ubuntu 18.04 and 20.04 and SLES 12SP5. If you are running older versions of operating systems, these will also need to be upgraded prior to the cluster upgrade commencing.

Step 4b: Upgrading the RDBMS

CDP supports MariaDB 10.2-10.4, MySQL 5.7 and 8.0, PostgreSQL 10, 11 and 12 and OracleDB 12c, 19c and 19.9.

Step 5: Upgrading Cloudera Manager

Cloudera Manager should also be backed up before an upgrade, which includes the RDBMS and any Cloudera Management Service directories.

The Cloudera Manager Server and Cloudera Manager Agent are updated via your Operating System’s package management system. First, update the configured repository and then run the upgrade commands.

Once Cloudera Manager Server is restarted and the agents are all checking in, you can go ahead and upgrade the Cloudera Management Services via the web UI.

Step 6: Upgrading CDH to CDP Runtime

The first step of the upgrade is to configure CM to see the new parcels and from there you launch the upgrade wizard from the parcels page.

The wizard will guide you through the following steps:

  1. Resolve Spark2 alternatives priority – for CDH5 only
  2. Add Tez Service – this is required for Hive 3.
  3. Add New Solr Service – Ranger requires a dedicated Solr for audit logs. 
    1. Note: This runs on a separate port from other Solr instances running business-focused use cases.
  4. Add YARN Queue Manager – A user interface for managing YARN queues
    1. Fair Scheduler to Capacity Scheduler – We provide a fs2cs command line tool for migrating from Fair Scheduler to Capacity Scheduler but recommend that you carefully review and tune the Capacity Scheduler config before and after the upgrade.
  5. Add Hive on Tez Service – 
    1. Note: The HiveServer2 role is moved to this service and should no longer be accessed under the Hive service within Cloudera Manager.
  6. Add Ranger Service – Ranger is replacing Sentry and parts of Navigator focused on auditing.
  7. Install Atlas – Replaces Navigator for Lineage and Cataloging
    1. Add Kafka Service – Required for Atlas if it’s not already installed
    2. Add HBase Service – Required for Atlas if it’s not already installed
    3. Add Atlas Service
  8. Navigator to Atlas migration
  9. Set TLS settings – It’s important to ensure that all keystore and truststore settings are configured otherwise services may struggle to connect to Ranger or Atlas as part of the upgrade process.
  10. Export Sentry permissions – 
    1. This step is now automated as part of CM 7.4.4  and will later be converted to Ranger policies and automatically imported during the Upgrade Wizard process
  11. Backup Cluster Metadata and Databases for CM, Hive and Oozie
  12. Run Upgrade

Step 7: Post Upgrade Steps

There are several post-upgrade steps that must be completed after the Upgrade Wizard finishes. These steps will help prepare the system for final testing and validation, and they cover additional configuration and run-time changes to be aware of with your CDP cluster. Review the CDH5 and CDH6 post-upgrade documentation to understand the specific tasks required for coming from each release.

Completion and Finalization

Once the upgrade is complete all services should be up and running. At this point you should perform another health check and ensure that all services are working correctly. You can rebaseline workloads and use WXM to perform a before and after comparison.

Once you are happy with the status of the upgrade you can finalize the HDFS metadata. Important: Until this step has been performed any deleted blocks will not be deleted, meaning that rollback is possible. Do not perform the finalization step until you are absolutely ready! Once you have finalized HDFS, you cannot roll back.

Summary

The end-to-end process is relatively straightforward and is mainly wizard driven. Care should be taken to ensure that applications and workloads are tested in lower environments and that any incompatibilities are ironed out before production. 

Review the video, above, of an actual cluster upgrade and contact your account team or Cloudera support if you would like to discuss the next steps in your CDP journey. 

For additional information on the upgrade process, please see 

Tristan Stevens
Director of Technology, CDP Centre of Excellence
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.