Cloudera Enterprise 5.7 is now generally available (comprising CDH 5.7, Cloudera Manager 5.7, and Cloudera Navigator 2.6).
Cloudera is excited to announce the general availability of Cloudera Enterprise 5.7! Main highlights of this release include production-ready Hive-on-Spark functionality, which will help users accelerate their use of Apache Spark as a data processing standard; 4x performance gains for Apache Impala (incubating); easier cluster configuration and utilization reporting; and end-to-end encryption for Apache Spark data.
The recently-released Apache Hive 2.0 contains some exciting improvements, many of which are already available in CDH 5.x.
Recently, the Apache Hive community announced Hive 2.0.0. This is a larger release compared to the previous one (covered here), with a lengthy list of new features (many experimental), enhancements, and bug fixes. Cloudera’s Hive team have been working with the community for months to drive toward this significant release.
New testing results show a significant difference between the analytic database performance of Impala compared to batch and procedural development engines, as well as Impala running all 99 TPC-DS-derived queries in the benchmark workload.
2015 was an exciting year for Apache Impala (incubating). Cloudera’s Impala team significantly improved Impala’s scale and stability, which enabled many customers to deploy Impala clusters with hundreds of nodes, run millions of queries,
Fixes in CDH 5.5 make writing Parquet data for Apache Impala (incubating) much easier.
Over the last few months, several Cloudera customers have provided the feedback that Parquet is too hard to configure, with the main problem being finding the right layout for great performance in Impala. For that reasons, CDH 5.5 contains new features that make those configuration problems go away.
Auto-Detection of HDFS Block Size
Contributors from Intel, Cloudera, and the rest of the community have been making strong progress on the Hive-on-Spark initiative. This post provides an update.
[Editor’s note (April 20, 2016): Hive-on-Spark is now GA/shipping starting in CDH 5.7.]
Since its inception about one year ago, the community initiative to make Apache Spark a data processing engine for Apache Hive (HIVE-7292) has attracted widespread interest from developers around the world and gone through phases of rapid development,