New testing results show a significant difference between the analytic database performance of Impala compared to batch and procedural development engines, as well as Impala running all 99 TPC-DS-derived queries in the benchmark workload.
2015 was an exciting year for Apache Impala (incubating). Cloudera’s Impala team significantly improved Impala’s scale and stability, which enabled many customers to deploy Impala clusters with hundreds of nodes, run millions of queries,
Fixes in CDH 5.5 make writing Parquet data for Apache Impala (incubating) much easier.
Over the last few months, several Cloudera customers have provided the feedback that Parquet is too hard to configure, with the main problem being finding the right layout for great performance in Impala. For that reasons, CDH 5.5 contains new features that make those configuration problems go away.
Auto-Detection of HDFS Block Size
Contributors from Intel, Cloudera, and the rest of the community have been making strong progress on the Hive-on-Spark initiative. This post provides an update.
[Editor’s note (April 20, 2016): Hive-on-Spark is now GA/shipping starting in CDH 5.7.]
Since its inception about one year ago, the community initiative to make Apache Spark a data processing engine for Apache Hive (HIVE-7292) has attracted widespread interest from developers around the world and gone through phases of rapid development,
Apache Hive 1.2.0, although not a major release, contains significant improvements.
Recently, the Apache Hive community moved to a more frequent, incremental release schedule. So, a little while ago, we covered the Apache Hive 1.0.0 release and explained how it was renamed from 0.14.1 with only minor feature additions since 0.14.0.
Shortly thereafter, Apache Hive 1.1.0 was released (renamed from Apache Hive 0.15.0), which included more significant features—including Hive-on-Spark.
Learn how to read FIX message files directly with Hive, create a view to simplify user queries, and use a flattened Apache Parquet table to enable fast user queries with Impala.
The Financial Information eXchange (FIX) protocol is used widely by the financial services industry to communicate various trading-related activities. Each FIX message is a record that represents an action by a financial party, such as a new order or an execution report.