HDFS now includes (shipping in CDH 5.8.2 and later) a comprehensive storage capacity-management approach for moving data across nodes.
In HDFS, the DataNode spreads the data blocks into local filesystem directories, which can be specified using dfs.datanode.data.dir in hdfs-site.xml. In a typical installation, each directory, called a volume in HDFS terminology, is on a different device (for example, on separate HDD and SSD).
When writing new blocks to HDFS,
The Apache Hadoop project recently announced its 3.0.0-alpha1 release.
Given the scope of a new major release, the Apache Hadoop community decided to release a series of alpha and beta releases leading up to 3.0.0 GA. This gives downstream applications and end users an opportunity to test and provide feedback on the changes, which can be incorporated during the alpha and beta process.
The 3.0.0-alpha1 release incorporates thousands of new fixes,
Recently installed fault-injection techniques are making quality assurance processes yet more rigorous.
In a previous installment of our series about quality assurance inside Cloudera, we described the fault-injection frameworks (AgenTEST and Sapper) that Cloudera Engineering has devised. The fault-injection framework starts and stops injections, to determine when and how they should occur, respectively.
On that occasion, we presented a number of disk-related injections implemented in AgenTEST, including:
- BurnIO: Runs disk-intensive processes,
Cloudera Enterprise 5.8 includes the latest release of Hue (3.10), the web UI that makes Apache Hadoop easier to use.
As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.8 includes a new release of Hue that makes several common tasks much easier. In the remainder of this post, we’ll provide a summary of the main improvements. (Hue 3.10 is also available for a quick try in one click on demo.gethue.com.)
New SQL Editor
Hue’s new code editor is a single-page app that is much simpler to use than the previous editor.
Released with CDH 5.8, Impala 2.6 brings solid performance improvements, particularly for clusters secured by Kerberos running BI workloads on Apache Hadoop.
Just a few months back, we showed you how Impala 2.5 delivered a 4x performance boost compared to Impala 2.3 for BI workloads on Hadoop via the introduction of several features like runtime filters. Here’s an update: Compared to two releases ago, Impala 2.6 delivers 12x better performance on secure workloads and continues this drumbeat of consistent performance improvement.