Category Archives: CDH

Inside the Apache Solr JSON Facet API

Categories: CDH Search

Solr 5 includes a completely re-written faceted search and analytics module with a structured JSON API to control the faceting and analytics commands. Here’s how it works.

Since I joined Cloudera a few years ago to help bring search-powered analytics to Cloudera’s platform, I’ve been working actively upstream alongside the rest of the Solr community to develop new functionality that will drive more interesting applications on Cloudera Search (which is based on an integration of Solr with the Apache Hadoop ecosystem).

Read More

How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop

Categories: CDH Hadoop HDFS

HDFS now includes (shipping in CDH 5.8.2 and later) a comprehensive storage capacity-management approach for moving data across nodes.

In HDFS, the DataNode spreads the data blocks into local filesystem directories, which can be specified using dfs.datanode.data.dir in hdfs-site.xml. In a typical installation, each directory, called a volume in HDFS terminology, is on a different device (for example, on separate HDD and SSD).

When writing new blocks to HDFS,

Read More

Getting to Know the Apache Hadoop 3 Alpha

Categories: CDH Hadoop

The Apache Hadoop project recently announced its 3.0.0-alpha1 release.

Given the scope of a new major release, the Apache Hadoop community decided to release a series of alpha and beta releases leading up to 3.0.0 GA. This gives downstream applications and end users an opportunity to test and provide feedback on the changes, which can be incorporated during the alpha and beta process.

The 3.0.0-alpha1 release incorporates thousands of new fixes,

Read More

Quality Assurance at Cloudera: Highly-Controlled Disk Injection

Categories: CDH Testing Tools

Recently installed fault-injection techniques are making quality assurance processes yet more rigorous.

In a previous installment of our series about quality assurance inside Cloudera, we described the fault-injection frameworks (AgenTEST and Sapper) that Cloudera Engineering has devised. The fault-injection framework starts and stops injections, to determine when and how they should occur, respectively.

On that occasion, we presented a number of disk-related injections implemented in AgenTEST, including:

  • BurnIO: Runs disk-intensive processes,

Read More

New in Cloudera Enterprise 5.8: SQL Editor and Other Productivity Improvements

Categories: CDH Hue Search Sentry

Cloudera Enterprise 5.8 includes the latest release of Hue (3.10), the web UI that makes Apache Hadoop easier to use.

As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.8 includes a new release of Hue that makes several common tasks much easier. In the remainder of this post, we’ll provide a summary of the main improvements. (Hue 3.10 is also available for a quick try in one click on demo.gethue.com.)

New SQL Editor

Hue’s new code editor is a single-page app that is much simpler to use than the previous editor.

Read More