Delivering transformational innovation and accurate business decisions requires harnessing the full potential of your organization’s entire data ecosystem. Ultimately, this boils down to how reliable and trustworthy the underlying data that feeds your insights and applications is. This applies to modern generative AI solutions that are particularly reliant on trusted, accurate, and context-specific data.
Implementing the right platform is half the battle won—so congratulations on your choosing Cloudera’s industry-leading hybrid data platform for building your data solutions on a foundation of trusted data. The other half of the equation requires your team’s emphasis to shift to sustained excellence in managing and optimizing your data ecosystem—better known as Day 2 operations. In this blog, we’ll cover the highlights of our recently published Day 2 Operations Guide and why it matters to enterprises.
In the fast-paced world of cloud-native products, mastering Day 2 operations is crucial for sustaining the performance and stability of Kubernetes-based platforms, such as CDP Private Cloud Data Services. Day 2 operations are akin to the housekeeping of a software system—vital for maintaining its health and stability. At Cloudera, our commitment to excellence extends beyond your deployment on Day 0 and Day 1, and into the critical phase of system maintenance and optimization.
Before delving into Day 2 operations for Cloudera on private cloud, let’s quickly demystify the jargon and define what these “days” mean.
- Day 0 —Design and Preparation: Focuses on designing and preparing for your installation, including gathering requirements, planning architecture, allocating resources, setting up network and security, and documentation creation.
- Day 1 —Deployment and Migration: Involves the actual deployment and initial configuration of the platform, including installation, configuration, testing, troubleshooting, and setting up monitoring tools, as well as migrating your data and workloads onto the platform.
- Day 2 —Operations and Optimization: Focuses on ongoing platform operations, including regular maintenance, user support, performance tuning, scaling, security monitoring, and updating documentation.
We’ve included a more detailed example of what Days 0, 1, and 2 involve in the appendix if you’re interested. You were right if you guessed that these key steps won’t necessarily all happen in a day!
To sum up, Day 2 operations involve meticulous attention to regular maintenance, proactive user support, and ongoing performance tuning. This is the stage where scalability becomes a reality, adapting to growing data and user demands while continuously fortifying security measures. Moreover, it is a period of dynamic adaptation, where documentation and operational protocols will adapt as your data and technology landscape change.
How does Cloudera support Day 2 operations?
For a cloud-native data platform that supports data warehousing, data engineering, and machine learning workloads launched by potentially thousands of concurrent users, aspects such as upgrades, scaling, troubleshooting, backup/restore, and security are crucial. Cloudera on private cloud is designed to manage these and more automatically. The rest of the blog covers precisely how the platform handles monitoring and troubleshooting of the platform when breakages happen.
Cloudera offers a multi-faceted approach to health checks, monitoring, and troubleshooting, including:
- Environment health checks, host-level health checks, data backup, and proactive monitoring and alerting. While this blog summarizes our Day 2 operations, we have published a detailed guide to help you through every step of the way here. Cloudera makes running these health and environment checks easy through the control plane UI as an action command.
- Status indicators at the component level that illustrate the state of the platform: healthy, warning, and critical. The threshold level for these alerts can be configured on the control pane to tailor the warning/critical alerts for specific health checks to a specific customer environment.
Monitoring and alerting
Proactive monitoring is key to maintaining a healthy and efficient Kubernetes environment. Cloudera’s data services on private cloud allow administrators to define custom alert rules based on PromQL expressions. These rules are designed to automatically trigger alerts when specific events occur, ensuring that any potential issues are promptly identified and addressed. These alerts can be viewed on the management console dashboard, and configured alert receivers can send notifications to specified endpoints, keeping the team informed and responsive.
The demonstration below illustrates configuring a custom alert for a Cloudera Data Services install using PromQL expressions.
Navigate to the management console using the below instructions:
To add a custom alert rule, click “add alert rule” button above and ensure the following fields bolded are populated; the others are optional:
- Enable Alert
- For Cause
- Workload Type
- PromQL Expression
For backup/restore of the platform data protection, Cloudera offers a data recovery system (DRS) out of the box that enables administrators to facilitate backup and restore of the Kubernetes platform. Cloudera recommends taking backups before any maintenance activity or upgrade to mitigate risks and restore the environment as needed. Additionally, these backup operations can be run while the cluster is up without impacting the running workloads. This functionality allows our customers to run periodic backups or as needed during business hours and maintenance windows.
Cloudera ensures that our customers are supported throughout their operational life cycle by focusing on continuous improvement, optimization, and adaptation. This ongoing support is crucial in a landscape where data requirements and interactions are constantly growing and evolving. Day 2 operations are pivotal in maintaining the platform’s stability and elevating the customer experience for users within the cluster. These operations ensure a seamless, efficient, and reliable service, impacting tenant satisfaction and trust in the platform.
Check out the Day 2 Operations Guide as you plan your upgrades to Cloudera’s Data Services on private cloud and bookmark it for future reference as you operate your state-of-the-art data platform. Stay tuned for upcoming blogs on managing Day 0 and Day 1 operations to optimize your upgrade.
|Day 0 (Design & Preparation)
|Day 1 (Deployment)
(Operations & Optimization)