Category Archives: CDH

New in CDH 5.7: Improved Performance, Security, and SQL Experience in Hue

Categories: CDH Hue

CDH 5.7 includes a lot of changes (more than 1,500) to Hue, the Web UI that makes Apache Hadoop easier to use.

In this new release, the emphasis on performance and security carries over from 5.5. The overall improvement in the SQL user experience is also considerable.

In this post, we’ll cover some highlights.

New Hive Metastore Interface

This app is now on a single page, 

Read More

How-to: Process and Index Medical Images with Apache Hadoop and Apache Solr

Categories: CDH Guest Search Use Case ZooKeeper

Thanks to Karthik Vadla, Abhi Basu, and Monica Martinez-Canales of Intel Corp. for the following guest post about using CDH for cost-effective processing/indexing of DICOM (medical) images.

Medical imaging has rapidly become the best non-invasive method to evaluate a patient and determine whether a medical condition exists. Imaging is used to assist in the diagnosis of a condition and, in most cases, is the first step of the journey through the modern medical system.

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS

Categories: CDH Cloud How-to Ops and DevOps

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud. 

There are several best practices for deploying a secure Apache Hadoop-powered enterprise data hub (EDH) cluster on Amazon Web Services (AWS), including use of Centrify Express for Linux-to-Active Directory host integration and Microsoft Active Directory as the core integration point for identity, authentication, authorization, and public key infrastructure (PKI).

Read More

Quality Assurance at Cloudera: Running/Upgrading to New Releases on Our Own EDH Cluster

Categories: CDH Cloudera Manager Testing

Learn why running real workloads on Cloudera’s internal EDH cluster is an important step in the overall QA process before releases.

At Cloudera, we strive to deliver a stable, reliable Apache Hadoop-based platform without sacrificing cutting-edge features. (See this post for an introduction to that process.)

In the past, we have written about how the Cloudera Support organization’s internal cluster helps improve the customer experience via CDH components such as Apache Impala (incubating) and Cloudera Search.

Read More

Apache Impala (incubating) in CDH 5.7: 4x Faster for BI Workloads on Apache Hadoop

Categories: CDH Impala Performance

Impala 2.5, now shipping in CDH 5.7, brings significant performance improvements and some highly requested features.

Impala has proven to be a high-performance analytics query engine since the beginning. Even as an initial production release in 2013, it demonstrated performance 2x faster than a traditional DBMS, and each subsequent release has continued to demonstrate the wide performance gap between Impala’s analytic-database architecture and SQL-on-Apache Hadoop alternatives.

Read More