Tracing with Apache Avro

Categories: Avro

Written by Patrick Wendell, an amazing summer intern with Cloudera and an Avro Committer.


In my summer internship project at Cloudera, I added RPC tracing as a first-order feature of Apache Avro. Avro is a platform for data storage and exchange that caters to data-intensive, dynamic applications. My project focused on Avro’s RPC functionality.

It is common knowledge that tracing in distributed systems can be difficult. In user-facing web services, a front-end function may recursively trigger several function calls to mid and back-tier services. In offline processing, data-center storage layers may distribute data across several hosts, querying one or many of them when a client requests a file. In either case, the inter-dependency of components makes it difficult to pinpoint the source of a slowdown or hang-up when they inevitably occur.

AvroTrace is designed as a first responder for diagnosing problems in distributed systems that use Avro for RPC transport. It has two components, a real-time monitoring dashboard and an offline trace analyzer. Both run as low-overhead Avro plugins which store and propagate tracing meta-data among RPC clients and servers. The monitoring dashboard is accessible via a web interface on any Avro server, delivering a “snapshot” of the most recent RPC activity. The offline analysis tool offers a basic interface for collecting, aggregating, and analyzing this data to identify problem spots. It is largely based on Google’s Dapper tracing infrastructure, which is itself inspired by X-Trace and other academic tracing research.

Below is an example trace analysis of a recursive RPC call pattern. In the example application,  one remote call, getFile() triggers two other RPC’s, getFileContents() and getFileMeta(). Avro’s tracing has detected this particular pattern and offers a dashboard view summarizing average timing and payload data. It is also showing detailed graphs for one of the specific nodes in this pattern, getFileContents() presenting a visual history of timing (top) and payload (bottom) analytics.

Turnkey tracing is just one of many reasons to use Avro.  I recently became a committer on the Avro project and I look forward to supporting and improving trace functionality in the coming months!

*Click on any of the graphs or stats for a larger version

Learn more about Avro and other Hadoop projects at Hadoop World!


3 responses on “Tracing with Apache Avro

  1. Jonas

    Very cool.
    When is this available?
    Is it production ready?
    Is it possible to add custom annotations like in Dapper?
    Are you consolidating the stats like in Dapper?

  2. Philip Zeyliger

    Hi Jonas,

    This was work done as part of and made it into the recently-built Avro 1.4.0 release.

    Thus far, it’s only been used internally, so it’s still in the early phases. I encourage you to try it!

    There’s underlying support for custom annotations, but there isn’t a fronted API for annotations yet.

    Whereas Dapper relies on BigTable to aggregate statistics, Avro doesn’t rely on any particular data store. Instead there is a basic mechanism for pulling tracing data from each node. Once the tracing data is in the same place, similar logic to Dapper is used to infer common trace patterns.


    — Philip (and Patrick)