Coming Attractions: Apache Hive 0.8.0

The Apache Hive team is hard at work putting the finishing touches on the 0.8.0 release. While the release hasn’t reached the GA milestone yet, I think now would be a good time to start highlighting some of the new features and improvements that users can expect to find in this important update:

Bitmap Indexes

The infrastructure required to support table indexes was originally added in the 0.7.0 release, but at the time no viable indexing plugin was provided. Project contributors have remedied this situation in the 0.8.0 release with the inclusion of support for bitmap indexes. This is a very important addition to Hive since it promises to significantly increase the performance of queries on indexed tables. More information about Hive Table Indexes can be found in the original design document, as well as in the comments that accompany the Bitmap Index JIRA ticket.

TIMESTAMP datatype

In response to frequent requests from users, Hive 0.8.0 will include support for the SQL TIMESTAMP datatype. We anticipate that this addition will make it much easier to integrate third-party ETL and BI tools with Hive. More information about the TIMESTAMP type can be found in the original JIRA ticket as well as in the Hive Language Manual.

Plugin Developer Kit

From the start, extensibility has been one of the key design goals of Hive; and the project has consistently delivered on this goal by providing a rich variety of extension points including User Defined Functions (UDFs), Serialization/Deserialization libraries (SerDes), StorageHandlers, and IndexHandlers. Up to this point one of the big inconveniences facing extension writers has been the requirement that they have access to a complete Hive source build. The new Hive Plugin Developer Kit seeks to relax this requirement by allowing developers to build and test extensions directly against a specific binary release of Hive. Currently the PDK is targeted only at UDFs, but there are plans to eventually extend this to support the other extension points including SerDes and StorageHandlers. More information about the Plugin Developer Kit can be found on the PDK page on the Hive Wiki.

JDBC Driver Improvements

The 0.8.0 release will also include several significant bug fixes and enhancements for the Hive JDBC driver. This module has received a lot of interest from the Hive user community due to the critical role it plays in enabling integrations between Hive and third-party ETL and BI tools.

I hope this quick overview has increased your interest in the next release of Hive. We expect the GA release of Hive 0.8.0 to drop sometime in the next couple of weeks, and look forward to getting feedback from the community about the new features described above.

Filed under:

8 Responses
  • David Phillips / November 21, 2011 / 7:12 PM

    The Plugin Developer Kit wouldn’t be needed at all if Hive published Maven artifacts. You can write UDFs or SerDes today using the artifacts from Cloudera:

    https://github.com/proofpoint/hive-serde

    Rather than providing a sample project that the user needs to copy and modify, Hive could have a Maven Archetype that generates a new project for the user.

  • sentono / November 23, 2011 / 5:41 AM

    nice .. it’s a new things to me. will take a look your blog more closely to learn the detail :)

  • Shahab / November 26, 2011 / 3:56 PM

    Will Hive 0.8.0 INSERT INTO support non-strict dynamic-partitioning?

  • JOE / December 06, 2011 / 5:54 AM

    Hi,when will the hive 0.8.0 release?
    thank you

  • Jon Zuanich / December 06, 2011 / 12:31 PM

    @Joe Apache Hive 0.8.0 should be released in the next couple weeks.

  • Mike / December 19, 2011 / 1:32 PM

    Hive 0.8 was released on Friday Dec 16. Yay!

    Any idea when the new version will make it into CDH and Cloudera Manager?

  • Carl Steinbach / December 19, 2011 / 1:38 PM

    @Mike: We’re planning to include Hive 0.8 in CDH4.

Leave a comment


2 − = zero