Bringing the Best of Apache Hive 0.13 to CDH Users
More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.
Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!
Furthermore, thanks to continual feedback from customers about their needs, we were able to test and make more than 300 Hive 0.13 fixes and stable features generally available via CDH 5.0.0, which we released last month. Thus, Cloudera customers can confidently take advantage of them in production right now, including:
- Native Parquet support
As we reported some time back, native support for Parquet, the open source, general-purpose, columnar storage format for the Apache Hadoop ecosystem, went upstream via HIVE-5783 in large part due to the efforts of Criteo engineers. Thus, users of CDH 5.0.0 can easily create Parquet tables in Hive and thus benefit from improved performance and compression.
- Scale and precision support for DECIMAL datatype
Per HIVE-3976, users can now specify the scale and precision of DECIMAL when creating a table.
- New CHAR datatype
CHAR datatypes are now supported in Hive (HIVE-5191), in addition to VARCHAR.
- Maven refactoring
Hive is Maven-ized (see HIVE-5610 for the merge to trunk) in CDH 5.0.0 for faster, easier builds.
- Public parallel testing framework
Cloudera proposed (HIVE-4739) to sponsor an open, public test cluster (on Amazon EC2) for the Hive community, and this environment is now available for users of Hive in CDH 5.0.0, as well as those of upstream Hive 0.13.
- SSL encryption for LDAP username/password
Per HIVE-5351, HiveServer2 supports encrypted communications via SSL with client drivers to enable secure LDAP username/password authentication as an alternative to Kerberos.
Thanks to these ongoing backports — which give CDH users continual access to the best of upstream Hive code — it will also be much easier for those users to upgrade to future releases of Hive!
Hive: The Batch-Processing Spoke in the Enterprise Data Hub
To summarize, as part of our ongoing effort to backport upstream Hive bits into CDH, CDH 5.0.0 users have access to many of the production-ready pieces of Hive 0.13. Furthermore, that functionality is present alongside differentiated components to ensure that enterprise data hub users have access to the best possible tools for their workloads, whether it be Apache Spark for interactive analytics, Hive for batch processing, Impala for interactive SQL, or multiple other options.
Justin Kestelyn is Cloudera’s developer outreach director.