Five years ago, Cloudera shared with the world our plan to transfer the lessons from decades of relational database research to the Apache Hadoop platform via a new SQL engine — Apache Impala — the first and fastest open source MPP SQL engine for Hadoop. Impala enabled SQL users to operate on vast amounts of data in open formats, stored on HDFS originally (with Apache Kudu, Amazon S3, and Microsoft ADLS now also native storage options),
Unmodified TPC-DS-based performance benchmark show Impala’s leadership compared to a traditional analytic database (Greenplum), especially for multi-user concurrent workloads. Additionally, benchmark continues to demonstrate significant performance gap between analytic databases and SQL-on-Hadoop engines like Hive LLAP, Spark SQL, and Presto.
The past year has been one of the biggest for Apache Impala (incubating). Not only has the team continued to work on ever-growing scale and stability, but a number of key capabilities have been rolled out that further solidifies Impala as the open standard for high-performance BI and SQL analytics.
Impala users can expect new performance and usability benefits via improved integration with Kudu.
It’s been nearly one year since the public beta announcement of Kudu (now a top-level Apache project) and a noteworthy milestone has been reached: its 1.0 release. This is particularly exciting as Kudu extends the use cases that can be supported on the Apache Hadoop platform, whether it be on-premises or in the cloud,
Impala’s speed now beats the fastest SQL-on-Hadoop alternatives. Test for yourself!
Since the initial beta release of Cloudera Impala more than one year ago (October 2012), we’ve been committed to regularly updating you about its evolution into the standard for running interactive SQL queries across data in Apache Hadoop and Hadoop-based enterprise data hubs. To briefly recap where we are today:
- Impala is being widely adopted.