Bringing the Best of Apache Hive 0.13 to CDH Users

More than 300 bug fixes and stable features in Apache Hive 0.13 have already been backported into CDH 5.0.0.

Last week, the Hive community voted to release Hive 0.13. We’re excited about the continued efforts and progress in the project and the latest release — congratulations to all contributors involved!

Furthermore, thanks to continual feedback from customers about their needs, we were able to test and make more than 300 Hive 0.13 fixes and stable features generally available via CDH 5.0.0, which we released last month. Thus, Cloudera customers can confidently take advantage of them in production right now, including:

  • Native Parquet support
    As we reported some time back, native support for Parquet, the open source, general-purpose, columnar storage format for the Apache Hadoop ecosystem, went upstream via HIVE-5783 in large part due to the efforts of Criteo engineers. Thus, users of CDH 5.0.0 can easily create Parquet tables in Hive and thus benefit from improved performance and compression.
  • Scale and precision support for DECIMAL datatype
    Per HIVE-3976, users can now specify the scale and precision of DECIMAL when creating a table.
  • New CHAR datatype
    CHAR datatypes are now supported in Hive (HIVE-5191), in addition to VARCHAR.
  • Maven refactoring
    Hive is Maven-ized (see HIVE-5610 for the merge to trunk) in CDH 5.0.0 for faster, easier builds.
  • Public parallel testing framework
    Cloudera proposed (HIVE-4739) to sponsor an open, public test cluster (on Amazon EC2) for the Hive community, and this environment is now available for users of Hive in CDH 5.0.0, as well as those of upstream Hive 0.13. 
  • SSL encryption for LDAP username/password
    Per HIVE-5351, HiveServer2 supports encrypted communications via SSL with client drivers to enable secure LDAP username/password authentication as an alternative to Kerberos.

Thanks to these ongoing backports — which give CDH users continual access to the best of upstream Hive code — it will also be much easier for those users to upgrade to future releases of Hive!

Hive: The Batch-Processing Spoke in the Enterprise Data Hub

To summarize, as part of our ongoing effort to backport upstream Hive bits into CDH, CDH 5.0.0 users have access to many of the production-ready pieces of Hive 0.13. Furthermore, that functionality is present alongside differentiated components to ensure that enterprise data hub users have access to the best possible tools for their workloads, whether it be Apache Spark for interactive analytics, Hive for batch processing, Impala for interactive SQL, or multiple other options.

Justin Kestelyn is Cloudera’s developer outreach director.

Filed under:

5 Responses
  • Dattu I / May 01, 2014 / 5:48 PM

    We are looking for Hive 0.13.0 supprt in CDH 5. Our tests are failing with Kryo serialization exception. We noticed that Hive 0.13.0 fixed unbder Hive-5279 issue. Based on this issue, CDH 5 is already backported to Hive 0.13.0, then it should include HIVE-5279 issue too.

    Please advise us where we can get the latest CDH 5 backported code for Hive 0.13.0.

    Greatly appreciate your help.

    Thanks
    D

  • Pavel Burdanov / May 02, 2014 / 10:37 PM

    Hive 0.13 has fixed serious bug HIVE-5994 about large bigints and zigzag encoding. As I know it has not backported to CDH5 yet.

  • Venkat Ankam / June 16, 2014 / 9:10 AM

    Which CDH version will have HIVE-5317(insert, update, and delete in Hive) ?

    Regards,
    Venkat

Leave a comment


five × = 35