Tracing with Apache Avro
Written by Patrick Wendell, an amazing summer intern with Cloudera and an Avro Committer.
In my summer internship project at Cloudera, I added RPC tracing as a first-order feature of Apache Avro. Avro is a platform for data storage and exchange that caters to data-intensive, dynamic applications. My project focused on Avro’s RPC functionality.
It is common knowledge that tracing in distributed systems can be difficult. In user-facing web services, a front-end function may recursively trigger several function calls to mid and back-tier services. In offline processing, data-center storage layers may distribute data across several hosts, querying one or many of them when a client requests a file. In either case, the inter-dependency of components makes it difficult to pinpoint the source of a slowdown or hang-up when they inevitably occur.
AvroTrace is designed as a first responder for diagnosing problems in distributed systems that use Avro for RPC transport. It has two components, a real-time monitoring dashboard and an offline trace analyzer. Both run as low-overhead Avro plugins which store and propagate tracing meta-data among RPC clients and servers. The monitoring dashboard is accessible via a web interface on any Avro server, delivering a “snapshot” of the most recent RPC activity. The offline analysis tool offers a basic interface for collecting, aggregating, and analyzing this data to identify problem spots. It is largely based on Google’s Dapper tracing infrastructure, which is itself inspired by X-Trace and other academic tracing research.
Below is an example trace analysis of a recursive RPC call pattern. In the example application, one remote call, getFile() triggers two other RPC’s, getFileContents() and getFileMeta(). Avro’s tracing has detected this particular pattern and offers a dashboard view summarizing average timing and payload data. It is also showing detailed graphs for one of the specific nodes in this pattern, getFileContents() presenting a visual history of timing (top) and payload (bottom) analytics.
Turnkey tracing is just one of many reasons to use Avro. I recently became a committer on the Avro project and I look forward to supporting and improving trace functionality in the coming months!
*Click on any of the graphs or stats for a larger version