Like most of our customers, Cloudera’s internal operations rely heavily on data. For more than a decade, Cloudera has built internal tools and data analysis primarily on a single production CDH cluster. This cluster runs workloads for every department – from real-time user interfaces for Support to providing recommendations in the Cloudera Data Platform (CDP) Upgrade Advisor to analyzing our business and closing our books. In this blog, we discuss our journey to CDP for this critical cluster. You can learn more about how we moved to CDP.
Our Internal Environment Before Upgrade
The most important step in a successful upgrade to CDP Private Cloud Base is understanding your environment. What specific components of CDP (like HBase, Impala, etc.) you rely on, the age of your infrastructure, and the characteristics of your workloads all impact the complexity of a move to CDP. We started by looking at the CDP Upgrade Documentation paying particular attention to Requirements and Supported Versions and the Pre-upgrade Transition Steps, which call out the parts of the product that have changed the most. In our case, we faced several notable challenges:
- A single long-running cluster with many workloads. Over a decade of operations, our cluster has built up a huge variety of workloads. Because we are architected in a single multi-tenant cluster, an upgrade to CDP requires all workloads for all teams to be prepared simultaneously.
- Aging infrastructure. While our team has always been quick to adopt new Cloudera products, we have not been similarly fast with our infrastructure. In our case, upgrading to CDP meant major upgrades of operating systems, RDBMS, and a minor Java upgrade. The CDP Upgrade Advisor identified most of these for us.
- 24×7 business-critical use cases. Our support organization uses a custom case-tracking system built on our software to interact with customers. We do everything we can to minimize this tool’s downtime, which presents a unique challenge during major upgrades.
Despite these challenges, we knew we wanted to go to CDP Private Cloud. Our cluster was running CDH 5.16.2 before the upgrade. We were also missing out on innovation from many major version upgrades in CDP for products we rely on, like HBase and Solr. More importantly, upgrading to CDP gives us the option to add a suite of new capabilities and a more modern data platform for the future.
Preparing to Move to CDP
The first decision you have to make when going to CDP is whether you will upgrade your existing cluster or migrate to a new one. After going through a pre-upgrade assessment with our Professional Services team, we chose to upgrade the existing cluster for a few reasons:
- Firstly, we wanted to get fully moved to CDP as quickly as possible. With a migration, we would move each workload to CDP one by one, which we estimated would take longer to complete than an upgrade.
- Secondly, we did not want to make the large capital outlay for an entirely new hardware platform. We did add some additional capacity to make parts of the testing and validation process easier, but many clusters can upgrade with no additional hardware.
- Finally, we have many workloads that are deeply interconnected. Part of the reason we run a single multi-tenant cluster is to make it possible to join data from different departments and get a full picture of our business. This architecture makes it harder to migrate workloads to a new cluster, as many workloads need to migrate at once.
For these reasons, an in-place upgrade made sense for us. Once you have decided between an in-place upgrade and a side-car migration, you can begin the in-depth preparation process.
Following our decision to upgrade, we went through the upgrade docs to determine what preparations we would need to make. In addition to the infrastructure changes mentioned above, we had a number of jobs to move from Spark 1 to Spark 2. There were also many Cloudera Search collections we would need to transition and some changes related to HBase 2. We went through each workload and estimated how much time it would take to get the CDH deployment ready for CDP. We also read through all of the upgrade requirements and documentation to gather a list of system-level changes (like operating system upgrades) and estimated those as well. These estimates turned into a project plan for the upgrade, which you can supplement with our upgrade checklists.
Performing the Upgrade
Upgrading our CDH cluster to CDP took only a day; preparing ourselves for a successful upgrade took much longer. Our project plan spanned three months, with most of the time spent preparing workloads for the upgrade, testing them on a CDP environment, and incrementally making the system-level changes we needed to be ready. We went through multiple test upgrades on non-production environments to be as ready as possible for issues that might arise during the upgrade itself.
In general, we tried to do anything we could do before the production upgrade separately from the upgrade itself. For example, we moved all of our Spark 1 jobs to Spark 2 and deleted Spark 1 before the upgrade. We could have treated the move to Spark 2 as a deployment action shortly after the upgrade but decided that if it were possible to make a change in advance, we would do so to simplify what happened on the upgrade day.
We took a pre-upgrade downtime in production to accomplish some of the prerequisite tasks like database upgrade and operating system upgrades on our master hosts. That downtime also allowed us to test the disaster recovery environment that our 24×7 users would interact with during the production upgrade. We wanted to do these tasks in advance so we could focus only on upgrading from CDH to CDP when the time came.
The upgrade itself took us a full day. We were careful to follow the instructions diligently. These provide a number of checks and backups designed to stop you from getting into bad situations and help you recover if something goes wrong. We hit a few issues in the upgrade process, but our experienced operations team was able to work them out quickly, using the same support and escalation procedures available to our customers. We filed cases with Cloudera Support for the problems we ran into in our environments so that any problems we encountered can be fixed before you upgrade.
Life on CDP
We won’t pretend that everything worked perfectly when we got to CDP, but thanks to our workload test and migration process, almost all of our applications – and all of our 24×7 services – worked smoothly shortly after the upgrade. We experienced a long trickle of issues coming in from workloads that we had missed in the planning (it’s hard to keep track of 10 years of development by dozens of people) or needed slight tweaks. Fortunately, with the upgrade preparations documented and fresh in our minds, we could quickly solve most of the issues. When issues did arise, we could also reach out to Cloudera’s excellent Support or Professional Services teams for assistance.
The main adjustment we had to make was around the new features. For us, that was primarily Ranger. After many years of Sentry, we had to revise some of our internal processes around how we handle authorization requests, but overall the changes have been for the better. We can finally set user-level access!
The most exciting part of life on CDP is the opportunity to embrace the new tools available to us. We are quickly working on setting up products like Cloudera Machine Learning, Cloudera Data Catalog, and Cloudera Data Warehouse to improve capabilities around data discovery, advanced data modeling, and resource isolation. We look forward to sharing more about our use of those tools in future blogs.
Lessons Learned
Here are three takeaways from our experience that may help you prepare for the move to CDP:
- You will need a test environment running CDP. We used a template from Cloudera’s Professional Services team to quickly build a test cluster to run our code to ensure it would work on CDP. Because we expected we would need a few months to make workload changes, we did not want to upgrade our development environment immediately, so we added a temporary CDP cluster for testing.
- Understand your workloads. Your cluster may run many distinct workloads. It is critical to know what they are and understand how each of them may be affected by the move to CDP. We went through each workload on our cluster and noted whether they would need to be redeployed after the upgrade, whether we had to change code, and who was the point of contact if we had an issue. The workloads that had issues after our upgrade were the ones that were poorly documented or understood.
- Communicate early and often. We have hundreds of daily users across Cloudera, each of whom would be affected by the upgrade in different ways. A user with a scheduled job in Cloudera Data Science Workbench might need to make changes in preparation for CDP, while a user of our applications may just need to be aware of the planned downtime. Communicating to the users is essential to having a smooth experience. As soon as we had a rough estimate of the upgrade timing, we started communicating to our users and provided frequent updates as things changed.
If you’d like to hear more about our upgrade story, we’re hosting a webinar titled ‘How Cloudera is Driving Quicker Business Insights by Migrating to CDP’ to discuss this in more detail. Please register here to join us.
To plan your upgrade or migration to CDP Private Cloud Base, please contact your Cloudera account team, who will set up some time to walk through the available options with you. Additionally, here are some helpful resources: