Tag Archives: mr2

5 Pitfalls of Benchmarking Big Data Systems

Categories: Hadoop Performance

Benchmarking Big Data systems is nontrivial. Avoid these traps!

Here at Cloudera, we know how hard it is to get reliable performance benchmarking results. Benchmarking matters because one of the defining characteristics of Big Data systems is the ability to process large datasets faster. “How large” and “how fast” drive technology choices, purchasing decisions, and cluster operations. Even with the best intentions, performance benchmarking is fraught with pitfalls—easy to get numbers,

Read more

How Apache Hadoop YARN HA Works

Categories: Hadoop YARN

Thanks to recent work upstream, YARN is now a highly available service. This post explains its architecture and configuration details.

YARN, the next-generation compute and resource management framework in Apache Hadoop, until recently had a single point of failure: the ResourceManager, which coordinates work in a YARN cluster. With planned (upgrades) or unplanned (node crashes) events, this central service, and YARN itself, could become unavailable.

This post details Cloudera’s recent work in the Hadoop community (YARN-149) to make the ResourceManager (and thus YARN) highly available.

Read more

Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"

Categories: Hadoop YARN

Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.

Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.

CDH 5.0.0 is the first release of our software distribution where YARN and MapReduce 2 (MR2) is the default MapReduce execution framework,

Read more

The Truth About MapReduce Performance on SSDs

Categories: Hadoop Hardware MapReduce Performance

Cost-per-performance, not cost-per-capacity, turns out to be the better metric for evaluating the true value of SSDs.

In the Big Data ecosystem, solid-state drives (SSDs) are increasingly considered a viable, higher-performance alternative to rotational hard-disk drives (HDDs). However, few results from actual testing are available to the public.

Recently, Cloudera engineers did such a study based on a combination of SSDs and HDDs, with the goal of determining to what extent SSDs accelerate different MapReduce workloads,

Read more

Getting MapReduce 2 Up to Speed

Categories: Hadoop MapReduce

Thanks to the improvements described here, CDH 5 will ship with a version of MapReduce 2 that is just as fast (or faster) than MapReduce 1.

Performance fixes are tiny, easy, and boring, once you know what the problem is. The hard work is in putting your finger on that problem: narrowing, drilling down, and measuring, measuring, measuring.

Apache Hadoop is no exception to this rule. Recently, Cloudera engineers set out to ensure that MapReduce performance in Hadoop 2 (MR2/YARN) is on par with,

Read more