Secrets of Cloudera Support: The Champagne Strategy
At Cloudera, we put great pride into drinking our own champagne. That pride extends to our support team, in particular.
Cloudera Manager, our end-to-end management platform for CDH (Cloudera’s open-source, enterprise-ready distribution of Apache Hadoop and related projects), has a feature that allows subscription customers to send a snapshot of their cluster to us. When these cluster snapshots come to us from customers, they end up in a CDH cluster at Cloudera where various forms of data processing and aggregation can be performed.
Today, the system provides real-time support via an application we call Cloudera Support Interface (CSI). When a support employee looks at a ticket, they can use CSI to examine the customer’s latest snapshot and see cluster stats such as version information, number of nodes in service, which services are used, and so on. CSI also visualizes different aggregations and groupings, such as versions, which allows us to detect misconfigured clusters, or issues caused during upgrade or installation.
The system collects a lot of information, including:
- Configuration data and historical information from Cloudera Manager
- Individual node configurations from nodes
- Log information from each node, with configurable log levels to control the size
- Per-node outputs of various command-line utilities
The information is parsed, aggregated, and stored in Apache HBase, which provides real-time information to the CSI application. Custom analytic queries are available to support employees upon request via JIRA.
In the near future, however, this data will move to Cloudera Impala (the new real-time query engine for Hadoop, in beta at the time of this writing) over HDFS for analytic use cases, at which point the support organization will use a BI tool to do self-service queries. For example, we could explore questions like:
- What is the distribution of workloads across Impala, Apache Hive, and HBase?
- Which OS versions are most commonly used?
- What are the mean and variance of hardware configurations?
- How many types of hardware configuration are there at a single customer site?
- Does anyone use that weirdo parameter that we want to deprecate?
- What are the most commonly encountered errors?
The end result? Better support, better products, and happier customers. Now, that’s what I call a good vintage.