Cloudera’s own enterprise data hub is yielding great results for providing world-class customer support.
Here at Cloudera, we are constantly pushing the envelope to give our customers world-class support. One of the cornerstones of this effort is the Cloudera Support Interface (CSI), which we’ve described in prior blog posts (here and here). Through CSI, our support team is able to quickly reason about a customer’s environment, search for information related to a case currently being worked, and much more.
In this post, I’m happy to write about a new feature in CSI, which we call Monocle Stack Trace.
Stack Trace Exploration with Search
Hadoop log messages and the stack traces in those logs are critical information in many of the support cases Cloudera handles. We find that our customer operation engineers (COEs) will regularly search for stack traces they find referenced in support cases to try to determine where else that stack trace has shown up, and in what context it would occur. This could be in the many sources we were already indexing as part of Monocle Search in CSI: Apache JIRAs, Apache mailing lists, internal Cloudera JIRAs, internal Cloudera mailing lists, support cases, Knowledge Base articles, Cloudera Community Forums, and the customer diagnostic bundles we get from Cloudera Manager.
It turns out that doing routine document searches for stack traces doesn’t always yield the best results. Stack traces are relatively long compared to normal search terms, so search indexes won’t always return the relevant results in the order you would expect. It’s also hard for a user to churn through the search results to figure out if the stack trace was actually an exact match in the document to figure out how relevant it actually is.
To solve this problem, we took an approach similar to what Google does when it wants to allow searching over a type that isn’t best suited for normal document search (such as images): we created an independent index and search result page for stack-trace searches. In Monocle Stack Trace, the search results show a list of unique stack traces grouped with every source of data in which unique stack trace was discovered. Each source can be viewed in-line in the search result page, or the user can go to it directly by following a link.
We also give visual hints as to how the stack trace for which the user searched differs from the stack traces that show up in the search results. A green highlighted line in a search result indicates a matching call stack line. Yellow indicates a call stack line that only differs in line number, something that may indicate the same stack trace on a different version of the source code. A screenshot showing the grouping of sources and visual highlighting is below:
The high-level implementation details are as follows:
- Every data source we fetch as part of standard Monocle Search indexing is marked for stack-trace processing.
- Every hour, a series of MapReduce jobs run to find and extract stack traces from the sources we’ve fetched.
- For each stack trace found, we create a unique key using the call stack that we use to uniquely identify an exception, and do a lookup in Apache HBase using that key. If there’s already a row, we append the source we found the new stack trace in to that row. If not, we create a new row in HBase, and insert the new call stack into the Search index.
- When a search is executed, each unique stack trace that Solr sees as a match in the indexed is returned.
- For each stack trace returned, we then do a lookup against HBase to find the sources that stack trace has been found in.
- The UI then does a line by line comparison of the call stack, highlighting each call stack line in each search result appropriately.
I’ve left out some details about optimizations we’ve made, such as our use of HBase’s bulk loading functionality for the extraction of stack traces from some of our larger data sources, but those aren’t critical to understanding the high-level data flow.
On the initial day of launch, we received feedback that the Monocle Stack Trace search index was especially useful when a COE was presented with a stack trace they know they’ve seen in a prior support case before. This way of grouping and visualizing the matches was bringing our supporters right to the case that was being recalled, instead of them having to sift through the long list of fuzzy search results a regular document search would yield. This resulted in less time searching, and more time focusing on solving the problem.
We feel like this application shows the power of an enterprise data hub (EDH). By having multiple strategies for storing, accessing, and processing data within our EDH, you can truly execute on building innovative applications that solve problems in new ways.
This application goes way beyond simple indexing and searching. We are using Cloudera Search, HBase, and MapReduce to process, store, and visualize stack traces that wouldn’t be possible with just a search index. How Monocle Stack Trace integrates with the larger CSI application goes way beyond that, though. It’s a great feeling when you are able to execute a search in Monocle Stack Trace that links directly to a point in time in a customer log file that an Impala query returned after churning through tens of GBs of data — done interactively from a Web UI on the order of a second or two. At Cloudera, we strongly believe in investing in these kinds of applications in the name of giving our COEs that extra edge to provide world-class support.
Adam Warrington is an Engineer Manager on the customer operations team at Cloudera.