Category Archives: Hive

Cloudera Enterprise 5.7 is Released

Categories: CDH Cloudera Manager Cloudera Navigator Hive Spark

Cloudera Enterprise 5.7 is now generally available (comprising CDH 5.7, Cloudera Manager 5.7, and Cloudera Navigator 2.6).

Cloudera is excited to announce the general availability of Cloudera Enterprise 5.7! Main highlights of this release include production-ready Hive-on-Spark functionality, which will help users accelerate their use of Apache Spark as a data processing standard; 4x performance gains for Apache Impala (incubating); easier cluster configuration and utilization reporting; and end-to-end encryption for Apache Spark data.

Read More

Apache Hive 2.0 is Released

Categories: CDH Hive

The recently-released Apache Hive 2.0 contains some exciting improvements, many of which are already available in CDH 5.x.

Recently, the Apache Hive community announced Hive 2.0.0. This is a larger release compared to the previous one (covered here), with a lengthy list of new features (many experimental), enhancements, and bug fixes. Cloudera’s Hive team have been working with the community for months to drive toward this significant release.

Read More

New SQL Benchmarks: Apache Impala (incubating) Uniquely Delivers Analytic Database Performance

Categories: Hive Impala Performance Spark

New testing results show a significant difference between the analytic database performance of Impala compared to batch and procedural development engines, as well as Impala running all 99 TPC-DS-derived queries in the benchmark workload.

2015 was an exciting year for Apache Impala (incubating). Cloudera’s Impala team significantly improved Impala’s scale and stability, which enabled many customers to deploy Impala clusters with hundreds of nodes, run millions of queries,

Read More

New in CDH 5.5: Apache Parquet Usability Improvements

Categories: CDH HDFS Hive Impala Parquet Performance

Fixes in CDH 5.5 make writing Parquet data for Apache Impala (incubating) much easier.

Over the last few months, several Cloudera customers have provided the feedback that Parquet is too hard to configure, with the main problem being finding the right layout for great performance in Impala. For that reasons, CDH 5.5 contains new features that make those configuration problems go away.

Auto-Detection of HDFS Block Size

For example,

Read More

Progress Report: Hive-on-Spark Nears Production Readiness

Categories: Cloudera Labs Hive Spark

Contributors from Intel, Cloudera, and the rest of the community have been making strong progress on the Hive-on-Spark initiative. This post provides an update.

[Editor’s note (April 20, 2016): Hive-on-Spark is now GA/shipping starting in CDH 5.7.]

Since its inception about one year ago, the community initiative to make Apache Spark a data processing engine for Apache Hive (HIVE-7292) has attracted widespread interest from developers around the world and gone through phases of rapid development,

Read More