This past Monday marked the official release of Apache Hive 0.9.0. Users interested in taking this release of Hive for a spin can download a copy from the Apache archive site. The following post is a quick summary of new features and improvements users can expect to find in this update of the popular data warehousing system for Hadoop.
The 0.9.0 release continues the trend of extending Hive’s SQL support. Hive now understands the BETWEEN operator and the NULL-safe equality operator, plus several new user defined functions (UDF) have now been added. New UDFs include printf(), sort_array(), and java_method(). Also, the concat_ws() function has been modified to support input parameters consisting of arrays of strings.
This Hive release also includes several significant improvements to the query compiler and execution engine. HIVE-2642 improved Hive’s ability to optimize UNION queries, HIVE-2881 made the the map-side JOIN algorithm more efficient, and Hive’s ability to generate optimized execution plans for queries that contain multiple GROUP BY clauses was significantly improved in HIVE-2621.
HBase users will also be interested in several improvements to Hive’s HBase StorageHandler, mainly:
- The ability to access primitive types stored in binary format within HBase (HIVE-1634),
- And support for filter-pushdown for keys (HIVE-2861, HIVE-2815, HIVE-2771).
Finally, I’d like to commend Ashutosh Chauhan on a job well done as the release manager for Hive 0.9.0. Ashutosh became a Hive committer six months ago and since then has had a significant impact on the project by doing lots of code reviews, helping answer questions on the mailing list, and through continued patch submissions. He did a great job as a first-time release manager, and I hope that he will reprise this role in the future!