Latest Impala Cookbook

Categories: Impala

Over the past year (and through several releases), Apache Impala (incubating) has added numerous new features and performance enhancements better enabling high-performance SQL analytics over big data.  Thus, it is time again for an update to the Impala cookbook, which contains best practices for these new features, updated guidelines, and more detailed examples.

Note: This cookbook does not yet capture best practices for the major new advancements available with the recent GA of Kudu.

Read More

Apache Hadoop 3.0.0-alpha2 Released

Categories: Community Hadoop

The Apache Hadoop project announced the release of 3.0.0-alpha2 on January 25th, 2017. This is the second alpha release in the 3.0.0 release series leading up to 3.0.0 GA, and incorporates 857 new fixes, improvements, and features since 3.0.0-alpha1 last September. It’s worth reading our previous blog post about 3.0.0-alpha1; in this post, we’ll discuss the new improvements that landed in alpha2.

Classpath Isolation for Hadoop Client Jars

The pain of classpath isolation has been experienced by many Java developers.

Read More

Untangling Apache Hadoop YARN, Part 5: Using FairScheduler queue properties

Categories: Hadoop YARN

Previously in Part 4, we described the most commonly used FairScheduler properties in Apache Hadoop.  In Part 5, we’ll provide some examples to show how properties can be used, individually and in combination, to achieve commonly desired behavior such as application prioritization and organizing queues.

Example: Best Effort Queue

Summary: Create a “best effort” queue that runs applications when the cluster is underutilized.  

Implementation: In FairScheduler,

Read More

Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem

Categories: Avro Guest Hadoop HBase Kudu Parquet

Zbigniew Baranowski is a database systems specialist and a member of a group which provides and supports central database and Hadoop-based services at CERN. This blog was originally released on CERN’s “Databases at CERN” blog, and is syndicated here with CERN’s permission.

 

TOPIC

This post presents a performance comparison of few popular data formats and storage engines available in the Apache Hadoop ecosystem: Apache Avro,

Read More

New in Cloudera Enterprise 5.10: Hue SQL Editor and Security Improvements

Categories: Hadoop Hue Oozie

Cloudera Enterprise 5.10 includes the latest updates of Hue, the intelligent editor for SQL Developers and Analysts.

As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.10 includes an updated version of Hue. We provide a summary of the main enhancements in the following part of this blog post. (Hue from C5.10 is also available for a quick try in one click on demo.gethue.com.)

SQL Improvements

The Hue editor keeps getting better with these major improvements:

Row Count

The number of rows returned is displayed so you can quickly see the size of the dataset.

Read More