Category Archives: Impala

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS

Categories: Cloud Impala Performance

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases.

Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databases. With more and more users looking to deploy and run in public-cloud environments,

Read More

Apache Impala (Incubating) on Amazon: Performance and Cost Considerations for S3 vs. EBS

Categories: Cloud Impala Performance

The benchmark testing results detailed below can help you make an informed decision about AWS storage options for Impala.

In a recent post, you learned how Impala 2.6 on S3 delivers cloud-native features unmatched by other analytic databases in the cloud. With support to read/write data from Amazon S3, Impala provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility not found on other cloud-based analytic databases, 

Read More

Microsoft Power BI Enables Connectivity to Apache Impala (Incubating)

Categories: Guest Impala

Microsoft recently announced a new Impala Connector for the Power BI Desktop (currently a preview, with GA expected early in 2017). Cloudera is also working with Microsoft’s Power BI Engineering team to certify it against Impala to ensure it meets critical enterprise requirements such as security. The following Microsoft post about the new connector, by Power BI senior program manager Miguel Llopis, is re-published below for your convenience.

In the Power BI Desktop July 2016 Update,

Read More

Analytics and BI on Amazon S3 with Apache Impala (Incubating)

Categories: Cloud Impala Ops and DevOps Performance

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same.

With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera’s platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store.

Read More

BI and SQL Analytics with Apache Impala (Incubating) in CDH 5.8: 3x Faster on Secure Clusters

Categories: CDH Impala

Released with CDH 5.8, Impala 2.6 brings solid performance improvements, particularly for clusters secured by Kerberos running BI workloads on Apache Hadoop.

Just a few months back, we showed you how Impala 2.5 delivered a 4x performance boost compared to Impala 2.3 for BI workloads on Hadoop via the introduction of several features like runtime filters. Here’s an update: Compared to two releases ago, Impala 2.6 delivers 12x better performance on secure workloads and continues this drumbeat of consistent performance improvement.

Read More