Apache Hive 2.0 is Released

Categories: CDH Hive

The recently-released Apache Hive 2.0 contains some exciting improvements, many of which are already available in CDH 5.x.

Recently, the Apache Hive community announced Hive 2.0.0. This is a larger release compared to the previous one (covered here), with a lengthy list of new features (many experimental), enhancements, and bug fixes. Cloudera’s Hive team have been working with the community for months to drive toward this significant release.

Here are some of the highlights with respect to Apache Hive 2.0 (see the release notes for a complete list of features, improvements, and bug fixes):

New Functionality

Performance and Optimizations

Security

Usability, Supportability, and Stability

Many of the production-ready improvements above are already included, or are scheduled to be included, in the CDH 5.x line, including the HiveServer2 web UI, new metrics, improved Apache Parquet support, and Hive-on-Spark enhancements. Furthermore, the Hive 2.0 release enforces safer configurations and chooses better defaults for certain configurations. (It’s worth noting, however, that the release also contains code that is either no longer supported or on path to deprecation, such as Hadoop-1, MR, and Java 6.)

In conclusion, there is much to be excited about in the Hive 2.0 release, and Cloudera has already backported some of the more significant features and fixes into CDH 5.x. We look forward to working with the rest of the Hive community to further improve and stabilize new features and enhancements along the 2.x release line, and to bring those improvements to CDH users as they become production-ready.

Xuefu Zhang is a Software Engineer at Cloudera and a PMC member of Apache Hive.

facebooktwittergoogle_pluslinkedinmailfacebooktwittergoogle_pluslinkedinmail

14 responses on “Apache Hive 2.0 is Released

  1. Steve Drill

    CDH 5.6.0 was just recently released and seems to have Hive 1.1 in it. So what are you saying? When will CDH have full Hive 2.0?

    1. Justin Kestelyn Post author

      Steve,

      CDH reflects a careful balance between stability and innovation. Thus, all CDH components comprise current Apache releases in addition to curated backports (after extensive testing and certification). So, the Hive code in CDH 5.5/5.6 is Hive 1.1 + a subset of Hive 2.0 features that are production-ready (as documented in this post). In this case, we do it that way because the “full” Hive 2.0 release contains alpha code as well as deprecated features, and we recommend neither for users just now.

    1. Justin Kestelyn Post author

      Megha,

      It’s not, no. There are some critical parts of HPL/SQL that depend on other features that are not yet production-ready, so it’s still being evaluated.

      What is your use case for this functionality? And are any aspects of it more important than others? Having such info is helpful.

      1. Michael

        When do you estimate that HPL/SQL will be supported?
        We have a use case where we need cursors. We have duplicate transactions that we need to eliminate in Hive.
        So we may have Transaction A have 4 duplicates. We need to open a cursor with Transaction A and its 4 duplicate rows.
        Then compare 1 to 2 with complex logic, then lets say that 1 is the winner, we would compare 1 to 3, then lets say that 3 is the winner, we would compare 3 to 4 with lets say 3 the winner. We would keep row 3 of Transaction A and then get the next set of duplicate rows for Transaction B and determine which row we need to keep.

  2. venkat

    If HPL/SQL is not supported, can we manually install it from apache hive 2.0 using following instructions?
    For Cloudera distributions, you can edit hplsql file, remove all lines containing
    export “HADOOP_CLASSPATH=…”
    and add the following line
    export “HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/*”
    as in
    http://www.hplsql.org/start

    1. Justin Kestelyn Post author

      As explained in this post, some production-ready Hive 2.0 features have already been back-ported to CDH 5.x (see release notes for details). They do not include HPL/SQL, however.

  3. Serg

    CDH 5.10.0 still has Hive 1.1. When will CDH finally have full Hive 2.0? The same question was asked a year ago but in relation to CDH 5.6.0. Seems like it’s skewed too much towards stability, is not it? Maybe it’s time to consider innovation? By all means, it should be at least Hive 1.2 by now.

Leave a Reply

Your email address will not be published. Required fields are marked *