Category Archives: Hive

Improving Query Performance Using Partitioning in Apache Hive

Categories: Guest Hive

Our thanks to Rakesh Rao of Quaero, for allowing us to re-publish the post below about Quaero’s experiences using partitioning in Apache Hive.

In this post, we will talk about how we can use the partitioning features available in Hive to improve performance of Hive queries.

Partitions

Hive is a good tool for performing queries on large datasets, especially datasets that require full table scans. But quite often there are instances where users need to filter the data on specific column values.

Read More

Apache Hive on Apache Spark: Motivations and Design Principles

Categories: Community Hive Spark

Two of the most vibrant communities in the Apache Hadoop ecosystem are now working together to bring users a Hive-on-Spark option that combines the best elements of both.

(Editor’s note [April 12, 2016]: Hive-on-Spark is now GA/ready for production as of CDH 5.7.)

Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop. Until recently, MapReduce was the only execution engine in the Hadoop ecosystem,

Read More

How-to: Configure JDBC Connections in Secure Apache Hadoop Environments

Categories: Hive How-to Impala Platform Security & Cybersecurity

Learn how HiveServer, Apache Sentry, and Impala help make Hadoop play nicely with BI tools when Kerberos is involved.

In 2010, I wrote a simple pair of blog entries outlining the general considerations behind using Apache Hadoop with BI tools. The Cloudera partner ecosystem has positively exploded since then, and the technology has matured as well. Today, if JDBC is involved, all the pieces needed to expose Hadoop data through familiar BI tools are available:

Read More

Using Impala at Scale at Allstate

Categories: Guest Hive Impala Parquet Use Case

Our thanks to Don Drake (@dondrake), an independent technology consultant who is currently working as a Principal Big Data Consultant at Allstate Insurance, for the guest post below about his experiences with Impala.

It started with a simple request from one of the managers in my group at Allstate to put together a demo of Tableau connecting to Cloudera Impala. I had previously worked on Impala with a large dataset about a year ago while it was still in beta,

Read More

Bringing the Best of Apache Hive 0.13 to CDH Users

Categories: CDH Hive

More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.

Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!

Furthermore, thanks to continual feedback from customers about their needs, we were able to test and make more than 300 Hive 0.13 fixes and stable features generally available via CDH 5.0.0,

Read More