Tag Archives: vectorization

Faster Swarms of Data : Accelerating Hive Queries with Parquet Vectorization

Categories: CDH Hive Parquet Performance

Background

Apache Hive is a widely adopted data warehouse engine that runs on Apache Hadoop. Features that improve Hive performance can significantly improve the overall utilization of resources on the cluster. Hive processes data using a chain of operators within the Hive execution engine. These operators are scheduled in the various tasks (for example, MapTask, ReduceTask, or SparkTask) of the query execution plan. Traditionally, these operators are designed to process one row at a time.

Read more