Cloudera Engineering Blog · Hive Posts

Download the Hive-on-Spark Beta

A Hive-on-Spark beta is now available via CDH parcel. Give it a try!

The Hive-on-Spark project (HIVE-7292) is one of the most watched projects in Apache Hive history. It has attracted developers from across the ecosystem, including from organizations such as Intel, MapR, IBM, and Cloudera, and gained critical help from the Spark community.

Apache Hive 1.0.0 Has Been Released

The Apache Hive PMC has recently voted to release Hive 1.0.0 (formerly known as Hive 0.14.1).

This release is recognition of the work the Apache Hive community has done over the past nine years and is continuing to do. The Apache Hive 1.0.0 release is a codebase that was expected to be released as 0.14.1 but the community felt it was time to move to a 1.x.y release naming structure.

Hands-on Hive-on-Spark in the AWS Cloud

Interested in Hive-on-Spark progress? This new AMI gives you a hands-on experience.

Nearly one year ago, the Apache Hadoop community began to embrace Apache Spark as a powerful batch-processing engine. Today, many organizations and projects are augmenting their Hadoop capabilities with Spark. As part of this shift, the Apache Hive community is working to add Spark as an execution engine for Hive. The Hive-on-Spark work is being tracked by HIVE-7292 which is one of the most popular JIRAs in the Hadoop ecosystem. Furthermore, three weeks ago, the Hive-on-Spark team offered the first demo of Hive on Spark.

Apache Hive on Apache Spark: The First Demo

The community effort to make Apache Spark an execution engine for Apache Hive is making solid progress.

Apache Spark is quickly becoming the programmatic successor to MapReduce for data processing on Apache Hadoop. Over the course of its short history, it has become one of the most popular projects in the Hadoop ecosystem, and is now supported by multiple industry vendors—ensuring its status as an emerging standard.

New in CDH 5.2: Apache Sentry Delegated GRANT and REVOKE

This new feature, jointly developed by Cloudera and Intel engineers, makes management of role-based security much easier in Apache Hive, Impala, and Hue.

Apache Sentry (incubating) provides centralized authorization for services and applications in the Apache Hadoop ecosystem, allowing administrators to set up granular, role-based protection on resources, and to review them in one place. Previously, Sentry only designated administrators to GRANT and REVOKE privileges on an authorizable object. In Apache Sentry 1.5.0 (shipping inside CDH 5.2), we have implemented a new feature (SENTRY-327) that allows admin users to delegate the GRANT privilege to other users using WITH GRANT OPTION. If a user has the GRANT OPTION privilege on a specific resource, the user can now grant the GRANT privilege to other users on the same resource. Apache Hive, Impala, and Hue have all been updated to take advantage of this new Sentry functionality.

Improving Query Performance Using Partitioning in Apache Hive

Our thanks to Rakesh Rao of Quaero, for allowing us to re-publish the post below about Quaero’s experiences using partitioning in Apache Hive.

In this post, we will talk about how we can use the partitioning features available in Hive to improve performance of Hive queries.

Partitions

Apache Hive on Apache Spark: Motivations and Design Principles

Two of the most vibrant communities in the Apache Hadoop ecosystem are now working together to bring users a Hive-on-Spark option that combines the best elements of both.

(Editor’s note [Feb. 25, 2015]: A Hive-on-Spark beta release is now available for download. Learn more here.)

How-to: Configure JDBC Connections in Secure Apache Hadoop Environments

Learn how HiveServer, Apache Sentry, and Impala help make Hadoop play nicely with BI tools when Kerberos is involved.

In 2010, I wrote a simple pair of blog entries outlining the general considerations behind using Apache Hadoop with BI tools. The Cloudera partner ecosystem has positively exploded since then, and the technology has matured as well. Today, if JDBC is involved, all the pieces needed to expose Hadoop data through familiar BI tools are available:

Using Impala at Scale at Allstate

Our thanks to Don Drake (@dondrake), an independent technology consultant who is currently working as a Principal Big Data Consultant at Allstate Insurance, for the guest post below about his experiences with Impala.

It started with a simple request from one of the managers in my group at Allstate to put together a demo of Tableau connecting to Cloudera Impala. I had previously worked on Impala with a large dataset about a year ago while it was still in beta, and was curious to see how Impala had improved since then in features and stability.

Bringing the Best of Apache Hive 0.13 to CDH Users

More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.

Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!

Older Posts