Category Archives: HDFS

Apache Hadoop 2.3.0 is Released (HDFS Caching FTW!)

Categories: Community Hadoop HDFS Impala

Hadoop 2.3.0 includes hundreds of new fixes and features, but none more important than HDFS caching.

The Apache Hadoop community has voted to release Hadoop 2.3.0, which includes (among many other things):

  • In-memory caching for HDFS, including centralized administration and management
  • Groundwork for future support of heterogeneous storage in HDFS
  • Simplified distribution of MapReduce binaries via the YARN Distributed Cache

You can read the release notes here.

Read More

Apache Hadoop 2 is Here and Will Transform the Ecosystem

Categories: Community Hadoop HDFS YARN

The release of Apache Hadoop 2, as announced today by the Apache Software Foundation, is an exciting one for the entire Hadoop ecosystem.

Cloudera engineers have been working hard for many months with the rest of the vast Hadoop community to ensure that Hadoop 2 is the best it can possibly be, for the users of Cloudera’s platform as well as all Hadoop users generally. Hadoop 2 contains many major advances,

Read More

How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop

Categories: Hadoop HDFS

One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other way around. Because of this, the Hadoop Distributed File System (HDFS) typically handles many “local reads” reads where the reader is on the same node as the data:

Initially, local reads in HDFS were handled the same way as remote reads: the client connected to the DataNode via a TCP socket and transferred the data via DataTransferProtocol.

Read More

Demo: HDFS File Operations Made Easy with Hue

Categories: HDFS Hue

Managing and viewing data in HDFS is an important part of Big Data analytics. Hue, the open source web-based interface that makes Apache Hadoop easier to use, helps you do that through a GUI in your browser —  instead of logging into a Hadoop gateway host with a terminal program and using the command line.

The first episode in a new series of Hue demos, the video below demonstrates how to get up and running quickly with HDFS file operations via Hue’s File Browser application.

Read More

Apache Hadoop 2.0.3-alpha Released

Categories: General Hadoop HDFS MapReduce YARN

Last week the Apache Hadoop PMC voted to release Apache Hadoop 2.0.3-alpha, the latest in the Hadoop 2 release series. This release fixes over 500 issues (covering the Common, HDFS, MapReduce and YARN sub-projects) since the 2.0.2-alpha release in October last year. In addition to bug fixes and general improvements the more noteworthy changes include:

  • HDFS High Availability (HA) can now use a Quorum Journal Manager (QJM) for sharing namenode edit logs (HDFS-3077).

Read More