Apache Impala Archives - Page 3 of 4

September 22, 2019 | Technical

Shared Transactional Tables: The Foundation of Next Generation Big Data Warehousing

The next generation of big data warehousing is being built on transactional tables. Transactions, of course, enable new use cases that require updating, deleting, and merging rows of data. But more importantly, a transaction-centric design enables advanced features such as materialized views, aggressive data caching and efficient replication between warehouses which are critical for modern […]

by Cloudera , Sanjay Radia 7 min read

April 22, 2019 | Technical

Fine-Grained Authorization with Apache Kudu and Impala

Apache Impala supports fine-grained authorization via Apache Sentry on all of the tables it manages including Apache Kudu tables. Given Impala is a very common way to access the data stored in Kudu, this capability allows users deploying Impala and Kudu to fully secure the Kudu data in multi-tenant clusters even though Kudu does not […]

by Grant Henke 4 min read

Apache Impala Apache Kudu Apache Sentry Apache Spark

March 4, 2019 | Technical

Transparent Hierarchical Storage Management with Apache Kudu and Impala

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that […]

by Grant Henke 9 min read

Apache Impala Apache Kudu Apache Parquet Cloudera Enterprise

January 30, 2019 | Technical

Scalability Improvement of Apache Impala 2.12.0 in CDH 5.15.0

Key Takeaways We have significantly improved Impala in CDH 5.15.0 to address some of the scalability bottlenecks in query execution. 64 concurrent streams of TPC-DS queries at 10TB scale in a 135-node cluster now run at 6x query throughput compared to previous releases. In addition to running faster, the query success rate also improved from […]

by Michael Ho , Lars Volker , Laurel Hale 5 min read

Apache Impala Cloudera Enterprise

December 13, 2018 | Technical

Assessment of Apache Impala Performance using Cloudera Manager Metrics – Part 1 of 3

For a user-facing system like Apache Impala, bad performance and downtime can have serious negative impacts on your business. Given the complexity of the system and all the moving parts, troubleshooting can be time-consuming and overwhelming. In this blog post series, we are going to show how the charts and metrics on Cloudera Manager (CM) […]

by Mansi Maharana , Suhita Goswami 9 min read

Apache Impala

December 12, 2018 | Technical

Assessment of Apache Impala Performance using Cloudera Manager Metrics – Part 1 of 3

For a user-facing system like Apache Impala, bad performance and downtime can have serious negative impacts on your business. Given the complexity of the system and all the moving parts, troubleshooting can be time-consuming and overwhelming. In this blog post series, we are going to show how the charts and metrics on Cloudera Manager (CM) […]

by Mansi Maharana , Suhita Goswami 10 min read

Apache Impala Cloudera Enterprise Performance

December 19, 2017 | Technical

Faster Performance for Selective Queries

One of the principal features used in analytic databases is table partitioning. This feature is so frequently used because of its ability to significantly reduce query latency by allowing the execution engine to skip reading data that is not necessary for the query. For example, consider a table of events partitioned on the event time […]

by Lars Volker 8 min read

Apache Impala Cloudera Enterprise

May 30, 2017 | Technical

Bi-temporal data modeling with Envelope

One of the most fundamental aspects a data model can convey is how something changes over time. This makes sense when considering that we build data models to capture what is happening in the real world, and the real world is constantly changing. The challenge is that it’s not just that new things are occurring, […]

by Jeremy Beard 8 min read

Apache Impala Apache Kudu Apache Spark Cloudera Enterprise Data Ingestion

April 25, 2017 | Technical

Apache Impala Leads Traditional Analytic Database

Unmodified TPC-DS-based performance benchmark show Impala’s leadership compared to a traditional analytic database (Greenplum), especially for multi-user concurrent workloads. Additionally, benchmark continues to demonstrate significant performance gap between analytic databases and SQL-on-Hadoop engines like Hive LLAP, Spark SQL, and Presto. The past year has been one of the biggest for Apache Impala (incubating). Not only […]

by Cloudera 7 min read

Apache Impala Cloudera Enterprise Performance

February 16, 2017 | Technical

Latest Impala Cookbook

Over the past year (and through several releases), Apache Impala (incubating) has added numerous new features and performance enhancements better enabling high-performance SQL analytics over big data. Thus, it is time again for an update to the Impala cookbook, which contains best practices for these new features, updated guidelines, and more detailed examples. Note: This […]

by Juan Yu < 1 min read

Apache Impala

Filter By