Apache Impala Archives - Cloudera Blog

July 13, 2023 | Technical

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache […]

by Riza Suminto 7 min read

July 11, 2023 | Technical

Integrating Cloudera Data Warehouse with Kudu Clusters

Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes […]

by Manish Maheshwari , Abhishek Rawat , Varun Jaitly 4 min read

Apache Impala Apache Kudu Data Hub Data Warehouse

August 4, 2022 | Technical

Speeding up Queries With Z-Order

Z-order is an ordering for multi-dimensional data, e.g. rows in a database table. Once data is in Z-order it is possible to efficiently search against more columns. This article reveals how Z-ordering works and how one can use it with Apache Impala. In a previous blog post, we demonstrated the power of Parquet page indexes, […]

by Zoltán Borók-Nagy , Norbert Luksa 12 min read

Apache Impala

November 22, 2021 | Business

Addressing the Three Scalability Challenges in Modern Data Platforms

How the Cloudera Data Platform helps organizations overcome scalability challenges

by Cloudera 6 min read

Apache Hive Apache Impala Apache Spark CDP Private Cloud CDP Public Cloud Cloudera Data Platform (CDP)

August 26, 2021 | Technical

Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The […]

by Cloudera , Aravindan Vijayan 7 min read

Apache Hadoop Apache Hive Apache Impala Apache NiFi Apache Ozone Apache Spark Ozone CDP Private Cloud Cloudera Data Science Workbench Machine Learning Data Ingestion Data Science Machine Learning

June 25, 2021 | Technical

Migrate Hive data from CDH to CDP public cloud

Introduction Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments. Using easy-to-define policies, Replication Manager solves […]

by Shailesh Shiwalkar 7 min read

Apache Hive Apache Impala Apache Ranger Apache Sentry Migrate to CDP CDP Public Cloud Cloudera Data Platform (CDP) Cloudera Enterprise Data Engineering Data Hub SDX Technologies Data Ingestion Governance Metadata & Lineage

January 21, 2021 | Technical

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the […]

by Cloudera 14 min read

Apache Hive Apache Impala Hue Cloudera Data Platform (CDP) Data Warehouse Education Energy & Utilities Financial Services Healthcare & Life Sciences Insurance Manufacturing & Automotive Public Sector Retail, Ecommerce & Consumer Products Technology Telecommunications Customer Analytics Modernize Architecture Performance

January 15, 2021 | Technical

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Introduction Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. With the ability to quickly provision on-demand and the lower fixed and administrative costs, the costs of operating a cloud data warehouse are driven mostly by the price-performance of the specific […]

by David Rorke , Tim Armstrong 6 min read

Apache Hive Apache Impala Cloud Data Warehouse CDP Private Cloud CDP Public Cloud Cloudera Data Platform (CDP) Data Warehouse Data Ingestion Data Science Modernize Architecture

January 15, 2021 | Technical

Optimized joins & filtering with Bloom filter predicate in Kudu

Introduction In database systems one of the most effective ways to improve performance is to avoid doing unnecessary work, such as network transfers and reading data from disk. One of the ways Apache Kudu achieves this is by supporting column predicates with scanners. Pushing down column predicate filters to Kudu allows for optimized execution by […]

by Cloudera 5 min read

Apache Impala Apache Kudu CDP Private Cloud CDP Public Cloud Performance

January 13, 2021 | Business

2020 Data Impact Award Winner Spotlight: United Overseas Bank

2020 was a year of immense change and disruption. Despite the challenges, 2020 also provided positive opportunities for forward leaps to be made in the realm of digital transformation. At Cloudera, an example of this leap is our first virtual Data Impact Awards, which was held in November last year. One of our stand out […]

by Cloudera 2 min read

Apache Hive Apache Impala Business Analytics Cloudera Data Platform (CDP) Cloudera Data Science Workbench Data Warehouse Financial Services Customer Analytics

Filter By