Apache Hive 1.2.0, although not a major release, contains significant improvements.
Recently, the Apache Hive community moved to a more frequent, incremental release schedule. So, a little while ago, we covered the Apache Hive 1.0.0 release and explained how it was renamed from 0.14.1 with only minor feature additions since 0.14.0.
Last week, the community released Apache Hive 1.2.0. Although a more narrow release than Hive 1.1.0, it nevertheless contains improvements in the following areas:
- Support for Apache Spark 1.3 (HIVE-9726), enabling dynamic executor allocation and impersonation
- Support for integration of Hive-on-Spark with Apache HBase (HIVE-10073)
- Support for numeric partition columns with literals (HIVE-10313, HIVE-10307)
- Support for Union Distinct (HIVE-9039)
- Support for specifying column list in insert statement (HIVE-9481)
Performance and Optimizations
- Enhanced use of Kryo serialization/deserialization for Hive-on-Spark (HIVE-9804, HIVE-9781)
- Predicate PushDown enhancements (HIVE-9069)
- Aggregating statistics for RDBM-based Metastore (HIVE-10382)
- Apache Parquet performance improvements (HIVE-10252, HIVE-9558)
- Improved Windowing support (HIVE-10627, HIVE-10686)
- Fixed LDAP vulnerability that affected previous releases (HIVE-9934)
- Encryption and log redaction (HIVE-9994, HIVE-9991)
Usability and Stability
- More stable and more usable Hive-on-Spark (HIVE-10143, HIVE-10023, HIVE-9847, HIVE-10009, HIVE-10291, HIVE-10209, HIVE-10143, and so on)
- Metastore reliability and column stats improvement (HIVE-10384, HIVE-9720)
- Added another level of explain for RDBMS audience (HIVE-9780)
For a larger but still incomplete list of features, improvements, and bug fixes, see the release notes. (Most of the Hive-on-Spark JIRAs are missing from the list.)
The most important improvements and fixes above (such as those involving security, for example) are already available in CDH 5.4.x releases. As another example, CDH users have been testing the Hive-on-Spark public beta since its first release, as well as improvements made to that beta in CDH 5.4.0.
We’re looking forward to working with the rest of the Apache Hive community to drive the project continually forward in the areas of SQL functionality, performance, security, and stability!
Xuefu Zhang is a Software Engineer at Cloudera and a PMC member of Apache Hive.