What is Hadoop Metrics2?

Metrics are collections of information about Hadoop daemons, events and measurements; for example, data nodes collect metrics such as the number of blocks replicated, number of read requests from clients, and so on. For that reason, metrics are an invaluable resource for monitoring Apache Hadoop services and an indispensable tool for debugging system problems. 

This blog post focuses on the features and use of the Metrics2 system for Hadoop, which allows multiple metrics output plugins to be used in parallel, supports dynamic reconfiguration of metrics plugins, provides metrics filtering, and allows all metrics to be exported via JMX.

Metrics vs. MapReduce Counters

When speaking about metrics, a question about their relationship to MapReduce counters usually arises. This differences can be described in two ways: First, Hadoop daemons and services are generally the scope for metrics, whereas MapReduce applications are the scope for MapReduce counters (which are collected for MapReduce tasks and aggregated for the whole job). Second, whereas Hadoop administrators are the main audience for metrics, MapReduce users are the audience for MapReduce counters.

Contexts and Prefixes

For organizational purposes metrics are grouped into named contexts – e.g., jvm for java virtual machine metrics or dfs for the distributed file system metric. There are different sets of contexts supported by Hadoop-1 and Hadoop-2; the table below highlights the ones supported for each of them.  

Branch-1

Branch-2

- jvm
- rpc
- rpcdetailed
- metricssystem
- mapred
- dfs
- ugi
- yarn
- jvm
- rpc
- rpcdetailed
- metricssystem
- mapred
- dfs
- ugi

A Hadoop daemon collects metrics in several contexts. For example, data nodes collect metrics for the “dfs”, “rpc” and “jvm” contexts. The daemons that collect different metrics in Hadoop (for Hadoop-1 and Hadoop-2) are listed below:

Branch-1 Daemons/Prefixes Branch-2 Daemons/Prefixes

- namenode
– datanode
– jobtracker
– tasktracker
– maptask
– reducetask

 

- namenode
- secondarynamenode
- datanode
- resourcemanager
- nodemanager
- mrappmaster
- maptask
- reducetask

System Design

The Metrics2 framework is designed to collect and dispatch per-process metrics to monitor the overall status of the Hadoop system. Producers register the metrics sources with the metrics system, while consumers register the sinks. The framework marshals metrics from sources to sinks based on (per source/sink) configuration options. This design is depicted below.

 

Here is an example class implementing the MetricsSource:

The “MyMetric” in the listing above could be, for example, the number of open connections for a specific server.

Here is an example class implementing the MetricsSink:

To use the Metric2s framework, the system needs to be initialized and sources and sinks registered. Here is an example initialization:

Configuration and Filtering

The Metrics2 framework uses the PropertiesConfiguration from the apache commons configuration library.

Sinks are specified in a configuration file (e.g., “hadoop-metrics2-test.properties”), as:

The configuration syntax is:

In the previous example, test is the prefix and mysink0 is an instance name. DefaultMetricsSystem would try to load hadoop-metrics2-[prefix].properties first, and if not found, try the default hadoop-metrics2.properties in the class path. Note, the [instance] is an arbitrary name to uniquely identify a particular sink instance. The asterisk (*) can be used to specify default options.

Here is an example with inline comments to identify the different configuration sections:

Here is an example set of NodeManager metrics that are dumped into the NodeManager sink file:

Each line starts with a time followed by the context and metrics name and the corresponding value for each metric.

Filtering

By default, filtering can be done by source, context, record and metrics. More discussion of different filtering strategies can be found in the Javadoc and wiki.

Example:

Conclusion

The Metrics2 system for Hadoop provides a gold mine of real-time and historical data that help monitor and debug problems associated with the Hadoop services and jobs. 

Ahmed Radwan is a software engineer at Cloudera, where he contributes to various platform tools and open-source projects.

 

Filed under:

No Responses

Leave a comment


four × 7 =