Category Archives: HDFS

Apache Hadoop 2.0.3-alpha Released

Categories: General Hadoop HDFS MapReduce YARN

Last week the Apache Hadoop PMC voted to release Apache Hadoop 2.0.3-alpha, the latest in the Hadoop 2 release series. This release fixes over 500 issues (covering the Common, HDFS, MapReduce and YARN sub-projects) since the 2.0.2-alpha release in October last year. In addition to bug fixes and general improvements the more noteworthy changes include:

  • HDFS High Availability (HA) can now use a Quorum Journal Manager (QJM) for sharing namenode edit logs (HDFS-3077).

Read More

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read More

Secrets of Cloudera Support: The Champagne Strategy

Categories: CDH Hadoop HBase HDFS Hive Impala Support Use Case

At Cloudera, we put great pride into drinking our own champagne. That pride extends to our support team, in particular.

Cloudera Manager, our end-to-end management platform for CDH (Cloudera’s open-source, enterprise-ready distribution of Apache Hadoop and related projects), has a feature that allows subscription customers to send a snapshot of their cluster to us. When these cluster snapshots come to us from customers, they end up in a CDH cluster at Cloudera where various forms of data processing and aggregation can be performed. 

Read More

Quorum-based Journaling in CDH4.1

Categories: CDH General HDFS

A few weeks back, Cloudera announced CDH 4.1, the latest update release to Cloudera’s Distribution including Apache Hadoop. This is the first release to introduce truly standalone High Availability for the HDFS NameNode, with no dependence on special hardware or external software. This post explains the inner workings of this new feature from a developer’s standpoint. If, instead, you are seeking information on configuring and operating this feature, please refer to the CDH4 High Availability Guide.

Read More

CDH4.1 Now Released!

Categories: CDH General Hadoop HBase HDFS Hive Oozie Pig Sqoop ZooKeeper

Update time!  As a reminder, Cloudera releases major versions of CDH, our 100% open source distribution of Apache Hadoop and related projects, annually and then updates to CDH every three months.  Updates primarily comprise bug fixes but we will also add enhancements.  We only include fixes or enhancements in updates that maintain compatibility, improve system stability and still allow customers and users to skip updates as they see fit.

We’re pleased to announce the availability of CDH4.1. 

Read More