Apache Impala Archives - Page 2 of 4

November 13, 2020 | Technical

Keeping Small Queries Fast – Short query optimizations in Apache Impala

This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? The reality is that data warehousing contains a large variety […]

by Shant Hovsepian , Tim Armstrong , Justin Hayes 15 min read

November 2, 2020 | Business

An Overview of Real Time Data Warehousing on Cloudera

Users today are asking ever more from their data warehouse. This is resulting in advancements of what is provided by the technology, and a resulting shift in the art of the possible. As an example of this, in this post we look at Real Time Data Warehousing (RTDW), which is a category of use cases […]

by Justin Hayes 8 min read

Apache Druid Apache Flink Apache Hive Apache Impala Apache Kafka Apache Kudu Business Analytics Cloudera Data Platform (CDP) Data Warehouse

October 20, 2020 | Technical

New Multithreading Model for Apache Impala

Introduction Today we are introducing a new series of blog posts that will take a look at recent enhancements to Apache Impala. Many of these are performance improvements, such as the feature described below which will give anywhere from a 2x to 7x performance improvement by taking better advantage of all the CPU cores. In […]

by Tim Armstrong , David Rorke , Shant Hovsepian , Justin Hayes 12 min read

Apache Impala Apache Parquet Cloud Cloud Data Warehouse Cloudera Data Platform (CDP) Data Warehouse Modernize Architecture Ops and DevOps

September 24, 2020 | Technical

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Aren’t two superheroes better than one? Some of the most powerful results come from combining complementary superpowers, and the “dynamic duo” of Apache Hive LLAP and Apache Impala, both included in Cloudera Data Warehouse, is further evidence of this. Both Impala and Hive can operate at an unprecedented and massive scale, with many petabytes of […]

by David Dichmann 4 min read

Apache Hive Apache Impala Cloudera Data Platform (CDP) Data Warehouse Data Ingestion Data Science Governance

August 17, 2020 | Technical

Enabling Automated Issue Resolution through the use of conversational ML

Intro The Cloudera Support Organization has always strived to not only provide solutions to our customers but to also deliver helpful knowledge. One of the primary sources of that knowledge comes from our Knowledge Articles. This content is created and curated by our knowledgeable Support Staff based on real-world experience coming from support cases. These […]

by Jacob Davis 6 min read

Apache Impala Apache Spark Machine Learning Cloudera Data Platform (CDP) Cloudera Data Science Workbench Data Warehouse Machine Learning Technology Machine Learning Search

June 18, 2020 | Technical

Build on your investment by Migrating or Upgrading to CDP Data Center

Editor’s Note, August 2020: CDP Data Center is now called CDP Private Cloud Base. You can learn more about it here. Cloudera Data Platform (CDP) Data Center(DC) is the on-premises release of Cloudera Data Platform. CDP DC combines the best services and components from Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with […]

by Karthik Krishnamoorthy 3 min read

Apache Atlas Apache Hive Apache Impala Apache Kudu Apache Ranger Enterprise data cloud CDP Private Cloud Cloudera Data Platform (CDP) Cloudera Enterprise Hortonworks Data Platform Modernize Architecture

May 5, 2020 | Business

Bloor Research identifies what makes a Modern Data Warehouse champion

When speaking with customers, I often hear that they are committed to digital transformation and being a data-driven enterprise. Those may just seem like abstract, lofty words to aspire to but the reality is much more practical. We have major banks needing to ensure that they have a complete view of their customers, and can […]

by David Dichmann 3 min read

Apache Hive Apache Impala Business Analytics Cloud Cloudera Data Platform (CDP) Data Warehouse Modernize Architecture

April 6, 2020 | Business

Hadoop: Decade Two, Day Zero*

This blog was originally published on Medium The Data Cloud — Powered By Hadoop One key aspect of the Cloudera Data Platform (CDP), which is just beginning to be understood, is how much of a recombinant-evolution it represents, from an architectural standpoint, vis-à-vis Hadoop in its first decade. I’ve been having a blast showing CDP to […]

by Arun Murthy 6 min read

Apache Hadoop Apache Hive Apache Impala Apache Yarn Cloud Hybrid Cloud CDP Private Cloud CDP Public Cloud Cloudera Data Platform (CDP) Data Hub Hortonworks Data Platform Data Science Modernize Architecture

January 21, 2020 | Technical

Speeding Up SELECT Queries with Parquet Page Indexes

Analytical SQL engines like Apache Impala are great for large table scans and aggregation query workloads. Individual tables in the big data ecosystem can reach petabytes in size, so achieving fast query response times requires intelligent filtering of table data based on conditions in the WHERE or HAVING clauses. Typically, you partition large tables using […]

by Zoltán Borók-Nagy , Gábor Szádovszky 7 min read

Apache Impala Apache Parquet Cloudera Data Platform (CDP) Data Warehouse

December 20, 2019 | Business

SQL Analytics at Scale: Selecting the Right SQL Engine for the Right Job

We are all hungry for data. Not just more data… also new types of data so that we can best understand our products, customers, and markets. We are looking for real-time insight on the newest available data in all shapes and sizes, structured and unstructured. We want to embrace the new generation of business and […]

by Sagar Kewalramani 6 min read

Apache Hive Apache Impala Apache Spark Data Warehouse Modernize Architecture

Filter By