Tracing with Apache Avro

Written by Patrick Wendell, an amazing summer intern with Cloudera and an Avro Committer.

 

In my summer internship project at Cloudera, I added RPC tracing as a first-order feature of Apache Avro. Avro is a platform for data storage and exchange that caters to data-intensive, dynamic applications. My project focused on Avro’s RPC functionality.

It is common knowledge that tracing in distributed systems can be difficult. In user-facing web services, a front-end function may recursively trigger several function calls to mid and back-tier services. In offline processing, data-center storage layers may distribute data across several hosts, querying one or many of them when a client requests a file. In either case, the inter-dependency of components makes it difficult to pinpoint the source of a slowdown or hang-up when they inevitably occur.

AvroTrace is designed as a first responder for diagnosing problems in distributed systems that use Avro for RPC transport. It has two components, a real-time monitoring dashboard and an offline trace analyzer. Both run as low-overhead Avro plugins which store and propagate tracing meta-data among RPC clients and servers. The monitoring dashboard is accessible via a web interface on any Avro server, delivering a “snapshot” of the most recent RPC activity. The offline analysis tool offers a basic interface for collecting, aggregating, and analyzing this data to identify problem spots. It is largely based on Google’s Dapper tracing infrastructure, which is itself inspired by X-Trace and other academic tracing research.

Below is an example trace analysis of a recursive RPC call pattern. In the example application,  one remote call, getFile() triggers two other RPC’s, getFileContents() and getFileMeta(). Avro’s tracing has detected this particular pattern and offers a dashboard view summarizing average timing and payload data. It is also showing detailed graphs for one of the specific nodes in this pattern, getFileContents() presenting a visual history of timing (top) and payload (bottom) analytics.

Turnkey tracing is just one of many reasons to use Avro.  I recently became a committer on the Avro project and I look forward to supporting and improving trace functionality in the coming months!

*Click on any of the graphs or stats for a larger version


Learn more about Avro and other Hadoop projects at Hadoop World!

Filed under:

3 Responses
  • Jonas / September 15, 2010 / 3:23 AM

    Very cool.
    When is this available?
    Is it production ready?
    Is it possible to add custom annotations like in Dapper?
    Are you consolidating the stats like in Dapper?
    Thanks.

  • Philip Zeyliger / September 16, 2010 / 11:11 AM

    Hi Jonas,

    This was work done as part of https://issues.apache.org/jira/browse/AVRO-595 and made it into the recently-built Avro 1.4.0 release.

    Thus far, it’s only been used internally, so it’s still in the early phases. I encourage you to try it!

    There’s underlying support for custom annotations, but there isn’t a fronted API for annotations yet.

    Whereas Dapper relies on BigTable to aggregate statistics, Avro doesn’t rely on any particular data store. Instead there is a basic mechanism for pulling tracing data from each node. Once the tracing data is in the same place, similar logic to Dapper is used to infer common trace patterns.

    Cheers,

    – Philip (and Patrick)

Leave a comment


× six = 42