Category Archives: YARN

How-to: Secure YARN Containers with Cloudera Navigator Encrypt

Categories: Cloudera Navigator Security YARN

Learn how Cloudera Navigator Encrypt bring data security to YARN containers.

With the introduction of transparent data encryption in HDFS, we are now a big step closer toward a secure platform in the Apache Hadoop world. However, there are still gaps in the platform, including how YARN and its applications manage their cache. In this post, I’ll explain how Cloudera Navigator Encrypt fills that particular gap.

Use Case

When a YARN application runs in a cluster it can sometimes spill data to the hard disk,

Read More

Apache Spark Resource Management and YARN App Models

Categories: Spark YARN

A concise look at the differences between how Spark and MapReduce manage cluster resources under YARN

The most popular Apache YARN application after MapReduce itself is Apache Spark. At Cloudera, we have worked hard to stabilize Spark-on-YARN (SPARK-1101), and CDH 5.0.0 added support for Spark on YARN clusters.

In this post, you’ll learn about the differences between the Spark and MapReduce architectures, why you should care,

Read More

How Apache Hadoop YARN HA Works

Categories: Hadoop YARN

Thanks to recent work upstream, YARN is now a highly available service. This post explains its architecture and configuration details.

YARN, the next-generation compute and resource management framework in Apache Hadoop, until recently had a single point of failure: the ResourceManager, which coordinates work in a YARN cluster. With planned (upgrades) or unplanned (node crashes) events, this central service, and YARN itself, could become unavailable.

This post details Cloudera’s recent work in the Hadoop community (YARN-149) to make the ResourceManager (and thus YARN) highly available.

Read More

Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"

Categories: Hadoop YARN

Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.

Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.

CDH 5.0.0 is the first release of our software distribution where YARN and MapReduce 2 (MR2) is the default MapReduce execution framework,

Read More

Hello, Apache Hadoop 2.4.0

Categories: CDH Hadoop YARN

The community has voted to release Apache Hadoop 2.4.0.

Hadoop 2.4.0 includes myriad improvements to HDFS and MapReduce, including (but not limited to):

  • ACL Support in HDFS — which allows, among other things, easier access to Apache Sentry-managed data by components that use it (already shipping in CDH 5.0.0)
  • Native support for rolling upgrades in HDFS (equivalent functionality already shipping inside CDH 4.5.0 and later)
  • Usage of protocol-buffers for HDFS FSImage for smooth operational upgrades
  • Complete HTTPS support in HDFS
  • Automatic Failover for ResourceManager HA in YARN
  • Preview version of the YARN Timeline Server for storing and serving generic application history

Congratulations to everyone who contributed!

Read More