Category Archives: Hive

Cloudera at ACM SIGMOD/PODS 2019

Categories: Events Hive

Sigmod conf 2019

The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. This year ACM SIGMOD/PODS will be held in Amsterdam, The Netherlands on June 30th – July 5th, 2019, and Cloudera will be present in the conference, contributing to and learning from the broader research community.

Last year,

Read more

Partition Management in Hadoop

Categories: Hadoop Hive

Guest blog post written by Adir Mashiach

In this post I’ll talk about the problem of Hive tables with a lot of small partitions and files and describe my solution in details.

partition management in hadoop

A little background

In my organization,  we keep a lot of our data in HDFS. Most of it is the raw data but a significant amount is the final product of many data enrichment processes.

Read more

Faster Swarms of Data : Accelerating Hive Queries with Parquet Vectorization

Categories: CDH Hive Parquet Performance

Background

Apache Hive is a widely adopted data warehouse engine that runs on Apache Hadoop. Features that improve Hive performance can significantly improve the overall utilization of resources on the cluster. Hive processes data using a chain of operators within the Hive execution engine. These operators are scheduled in the various tasks (for example, MapTask, ReduceTask, or SparkTask) of the query execution plan. Traditionally, these operators are designed to process one row at a time.

Read more

New in Cloudera Enterprise 6: Apache Hive 2.1

Categories: CDH Hive

We recently released Cloudera Enterprise 6.0 featuring significant improvements across a number of core components. In this blog post, we’re going to focus on Apache Hive 2.1.

Hive’s Approach to Rebase: Stability and Quality Most Important

Prior to the release of Cloudera Enterprise 6.0, Cloudera’s supported platform included Apache Hive 1.1 augmented with numerous features, enhancements and fixes from the later Apache Hive releases—all of which were included only after rigorous quality criteria were met.

Read more