Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.
Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.
CDH 5.0.0 is the first release of our software distribution where YARN and MapReduce 2 (MR2) is the default MapReduce execution framework,
The community has voted to release Apache Hadoop 2.4.0.
Hadoop 2.4.0 includes myriad improvements to HDFS and MapReduce, including (but not limited to):
- ACL Support in HDFS — which allows, among other things, easier access to Apache Sentry-managed data by components that use it (already shipping in CDH 5.0.0)
- Native support for rolling upgrades in HDFS (equivalent functionality already shipping inside CDH 4.5.0 and later)
- Usage of protocol-buffers for HDFS FSImage for smooth operational upgrades
- Complete HTTPS support in HDFS
- Automatic Failover for ResourceManager HA in YARN
- Preview version of the YARN Timeline Server for storing and serving generic application history
Congratulations to everyone who contributed!
Learn the new features and enhancements in Cloudera Manager 5, including support for YARN, management of third-party apps and frameworks, and more.
The response to the Oct. 2013 release of Cloudera Enterprise 5 Beta has been overwhelming, and Cloudera is busily working closely with several customers to incorporate their feedback.
Cloudera Manager 5 is a key part of this release, and in this post, I will provide a brief overview of some key features in Beta 1 as well as introduce some of those planned for Beta 2 (to be released in early 2014).
An overview of some of Cloudera’s contributions to YARN that help support management of multiple resources, from multi resource scheduling in the Fair Schedule to node-level enforcement
As Apache Hadoop become ubiquitous, it is becoming more common for users to run diverse sets of workloads on Hadoop, and these jobs are more likely to have different resource profiles. For example, a MapReduce distcp job or Cloudera Impala query that does a simple scan on a large table may be heavily disk-bound and require little memory.
Cloudera Manager lets you add a YARN service in the same way you would add any other Cloudera Manager-managed service.
In Apache Hadoop 2, YARN and MapReduce 2 (MR2) are long-needed upgrades for scheduling, resource management, and execution in Hadoop. At their core, the improvements separate cluster resource management capabilities from MapReduce-specific logic. They enable Hadoop to share resources dynamically between MapReduce and other parallel processing frameworks, such as Cloudera Impala;