Apache Parquet Archives - Cloudera Blog

November 13, 2020 | Technical

Keeping Small Queries Fast – Short query optimizations in Apache Impala

This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? The reality is that data warehousing contains a large variety […]

by Cloudera , Tim Armstrong , Justin Hayes 15 min read

October 20, 2020 | Technical

New Multithreading Model for Apache Impala

Introduction Today we are introducing a new series of blog posts that will take a look at recent enhancements to Apache Impala. Many of these are performance improvements, such as the feature described below which will give anywhere from a 2x to 7x performance improvement by taking better advantage of all the CPU cores. In […]

by Cloudera , David Rorke , Shant Hovsepian , Justin Hayes 12 min read

Apache Impala Apache Parquet Cloud Cloud Data Warehouse Cloudera Data Platform Data Warehouse Modernize Architecture Ops and DevOps

January 21, 2020 | Technical

Speeding Up SELECT Queries with Parquet Page Indexes

Analytical SQL engines like Apache Impala are great for large table scans and aggregation query workloads. Individual tables in the big data ecosystem can reach petabytes in size, so achieving fast query response times requires intelligent filtering of table data based on conditions in the WHERE or HAVING clauses. Typically, you partition large tables using […]

by Zoltán Borók-Nagy , Gábor Szádovszky 7 min read

Apache Impala Apache Parquet Cloudera Data Platform Data Warehouse

March 4, 2019 | Technical

Transparent Hierarchical Storage Management with Apache Kudu and Impala

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that […]

by Cloudera 9 min read

Apache Impala Apache Kudu Apache Parquet Cloudera Enterprise

December 21, 2018 | Technical

Faster Swarms of Data : Accelerating Hive Queries with Parquet Vectorization

Background Apache Hive is a widely adopted data warehouse engine that runs on Apache Hadoop. Features that improve Hive performance can significantly improve the overall utilization of resources on the cluster. Hive processes data using a chain of operators within the Hive execution engine. These operators are scheduled in the various tasks (for example, MapTask, […]

by Cloudera , santosh Kumar , Haifeng Chen , Cheng Xu , Wang Lifeng 5 min read

Apache Hive Apache Parquet Intel Partner

April 22, 2016 | Technical

Benchmarking Apache Parquet: The Allstate Experience

Our thanks to Don Drake (@dondrake), an independent technology consultant who is currently working at Allstate Insurance, for the guest post below about his experiences comparing use of the Apache Avro and Apache Parquet file formats with Apache Spark. Over the last few months, numerous hallway conversations, informal discussions, and meetings have occurred at Allstate […]

by Cloudera 6 min read

Apache Avro Apache Parquet Performance

February 20, 2014 | Technical

Native Parquet Support Comes to Apache Hive

Bringing Parquet support to Hive was a community effort that deserves congratulations! Previously, this blog introduced Parquet, an efficient ecosystem-wide columnar storage format for Apache Hadoop. As discussed in that blog post, Parquet encodes data extremely efficiently and as described in Google’s original Dremel paper. (For more technical details on the Parquet format read Dremel […]

by Cloudera 2 min read

Apache Hive Apache Impala Apache Parquet

January 13, 2014 | Technical

Impala Performance Update: Now Reaching DBMS-Class Speed

Impala’s speed now beats the fastest SQL-on-Hadoop alternatives. Test for yourself! Since the initial beta release of Cloudera Impala more than one year ago (October 2012), we’ve been committed to regularly updating you about its evolution into the standard for running interactive SQL queries across data in Apache Hadoop and Hadoop-based enterprise data hubs. To […]

by Cloudera 7 min read

Apache Hive Apache Impala Apache Parquet

Filter By