Category Archives: Hive

Apache Hive on Apache Spark: Motivations and Design Principles

Categories: Community Hive Spark

Two of the most vibrant communities in the Apache Hadoop ecosystem are now working together to bring users a Hive-on-Spark option that combines the best elements of both.

(Editor’s note [April 12, 2016]: Hive-on-Spark is now GA/ready for production as of CDH 5.7.)

Apache Hive is a popular SQL interface for batch processing and ETL using Apache Hadoop. Until recently, MapReduce was the only execution engine in the Hadoop ecosystem,

Read More

How-to: Configure JDBC Connections in Secure Apache Hadoop Environments

Categories: Hive How-to Impala Platform Security & Cybersecurity

Learn how HiveServer, Apache Sentry, and Impala help make Hadoop play nicely with BI tools when Kerberos is involved.

In 2010, I wrote a simple pair of blog entries outlining the general considerations behind using Apache Hadoop with BI tools. The Cloudera partner ecosystem has positively exploded since then, and the technology has matured as well. Today, if JDBC is involved, all the pieces needed to expose Hadoop data through familiar BI tools are available:

Read More

Using Impala at Scale at Allstate

Categories: Guest Hive Impala Parquet Use Case

Our thanks to Don Drake (@dondrake), an independent technology consultant who is currently working as a Principal Big Data Consultant at Allstate Insurance, for the guest post below about his experiences with Impala.

It started with a simple request from one of the managers in my group at Allstate to put together a demo of Tableau connecting to Cloudera Impala. I had previously worked on Impala with a large dataset about a year ago while it was still in beta,

Read More

Bringing the Best of Apache Hive 0.13 to CDH Users

Categories: CDH Hive

More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.

Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!

Furthermore, thanks to continual feedback from customers about their needs, we were able to test and make more than 300 Hive 0.13 fixes and stable features generally available via CDH 5.0.0,

Read More

How Impala Brings Real-Time, Big Data Analytics to Digital Reasoning’s Users

Categories: Guest Hive Impala Use Case

The following post, by Sarah Cannon of Digital Reasoning, was originally published in that company’s blog. Digital Reasoning has graciously permitted us to re-publish here for your convenience.

At the beginning of each release cycle, engineers at Digital Reasoning are given time to explore the latest in Big Data technologies, examining how the frequently changing landscape might be best adapted to serve our mission. As we sat down in the early stages of planning for Synthesys 3.8 one of the biggest issues we faced involved reconciling the tradeoff between flexibility and performance.

Read More