Cloudera Engineering Blog · Hadoop Posts

How Apache Hadoop YARN HA Works

Thanks to recent work upstream, YARN is now a highly available service. This post explains its architecture and configuration details.

YARN, the next-generation compute and resource management framework in Apache Hadoop, until recently had a single point of failure: the ResourceManager, which coordinates work in a YARN cluster. With planned (upgrades) or unplanned (node crashes) events, this central service, and YARN itself, could become unavailable.

Apache Hadoop YARN: Avoiding 6 Time-Consuming "Gotchas"

Understanding some key differences between MR1 and MR2/YARN will make your migration much easier.

Here at Cloudera, we recently finished a push to get Cloudera Enterprise 5 (containing CDH 5.0.0 + Cloudera Manager 5.0.0) out the door along with more than 100 partner certifications.

Hello, Apache Hadoop 2.4.0

The community has voted to release Apache Hadoop 2.4.0.

Hadoop 2.4.0 includes myriad improvements to HDFS and MapReduce, including (but not limited to):

This Month in the Ecosystem (March 2014)

Welcome to our seventh edition of “This Month in the Ecosystem,” a digest of highlights from March 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

More good news for the ecosystem!

Cloudera Enterprise 5 is Now Generally Available!

The GA release of Cloudera Enterprise 5 signifies the evolution of the platform from a mere Apache Hadoop distribution into an enterprise data hub.

We are thrilled to announce the GA release of Cloudera Enterprise 5 (comprising CDH 5.0 and Cloudera Manager 5.0). 

Index-Level Security Comes to Cloudera Search

The integration of Apache Sentry with Apache Solr helps Cloudera Search meet important security requirements.

As you have learned in previous blog posts, Cloudera Search brings the power of Apache Hadoop to a wide variety of business users via the ease and flexibility of full-text querying provided by Apache Solr. We have also done significant work to make Cloudera Search easy to add to an existing Hadoop cluster:

The Truth About MapReduce Performance on SSDs

Cost-per-performance, not cost-per-capacity, turns out to be the better metric for evaluating the true value of SSDs.

In the Big Data ecosystem, solid-state drives (SSDs) are increasingly considered a viable, higher-performance alternative to rotational hard-disk drives (HDDs). However, few results from actual testing are available to the public.

This Month in the Ecosystem (February 2014)

Welcome to our sixth edition of “This Month in the Ecosystem,” a digest of highlights from February 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

February being a short month, the list is relatively short — but never confuse quantity with quality!

A Guide to Checkpointing in Hadoop

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one.

Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apache Hadoop clusters.

Apache Hadoop 2.3.0 is Released (HDFS Caching FTW!)

Hadoop 2.3.0 includes hundreds of new fixes and features, but none more important than HDFS caching.

The Apache Hadoop community has voted to release Hadoop 2.3.0, which includes (among many other things):

Newer Posts Older Posts