Tag Archives: mapreduce 2

5 Pitfalls of Benchmarking Big Data Systems

Categories: Hadoop Performance

Benchmarking Big Data systems is nontrivial. Avoid these traps!

Here at Cloudera, we know how hard it is to get reliable performance benchmarking results. Benchmarking matters because one of the defining characteristics of Big Data systems is the ability to process large datasets faster. “How large” and “how fast” drive technology choices, purchasing decisions, and cluster operations. Even with the best intentions, performance benchmarking is fraught with pitfalls—easy to get numbers,

Read More

Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"

Categories: Hadoop YARN

Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.

Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.

CDH 5.0.0 is the first release of our software distribution where YARN and MapReduce 2 (MR2) is the default MapReduce execution framework,

Read More

Getting MapReduce 2 Up to Speed

Categories: Hadoop MapReduce

Thanks to the improvements described here, CDH 5 will ship with a version of MapReduce 2 that is just as fast (or faster) than MapReduce 1.

Performance fixes are tiny, easy, and boring, once you know what the problem is. The hard work is in putting your finger on that problem: narrowing, drilling down, and measuring, measuring, measuring.

Apache Hadoop is no exception to this rule. Recently, Cloudera engineers set out to ensure that MapReduce performance in Hadoop 2 (MR2/YARN) is on par with,

Read More

Migrating to MapReduce 2 on YARN (For Operators)

Categories: General Hadoop MapReduce YARN

Cloudera Manager lets you add a YARN service in the same way you would add any other Cloudera Manager-managed service.

In Apache Hadoop 2, YARN and MapReduce 2 (MR2) are long-needed upgrades for scheduling, resource management, and execution in Hadoop. At their core, the improvements separate cluster resource management capabilities from MapReduce-specific logic. They enable Hadoop to share resources dynamically between MapReduce and other parallel processing frameworks, such as Cloudera Impala;

Read More

Migrating to MapReduce 2 on YARN (For Users)

Categories: General Hadoop MapReduce YARN

In Apache Hadoop 2, YARN and MapReduce 2 (MR2) are long-needed upgrades for scheduling, resource management, and execution in Hadoop. At their core, the improvements separate cluster resource management capabilities from MapReduce-specific logic. They enable Hadoop to share resources dynamically between MapReduce and other parallel processing frameworks, such as Cloudera Impala; allow more sensible and finer-grained resource configuration for better cluster utilization; and permit Hadoop to scale to accommodate more and larger jobs.

Read More