A Hive-on-Spark beta is now available via CDH parcel. Give it a try!
The Hive-on-Spark project (HIVE-7292) is one of the most watched projects in Apache Hive history. It has attracted developers from across the ecosystem, including from organizations such as Intel, MapR, IBM, and Cloudera, and gained critical help from the Spark community.
Many anxious users have inquired about its availability in the last few months. Some users even built Hive-on-Spark from the branch code and tried it in their testing environments, and then provided us valuable feedback. The team is thrilled to see this level of excitement and early adoption, and has been working around the clock to deliver the product at an accelerated pace.
Thanks to this hard work, significant progress has been made in the last six months. (The project is currently incubating in Cloudera Labs.) All major functionality is now in place, including different flavors of joins and integration with Spark, HiveServer2, and YARN, and the team has made initial but important investments in performance optimization, including split generation and grouping, supporting vectorization and cost-based optimization, and more. We are currently focused on running benchmarks, identifying and prototyping optimization areas such as dynamic partition pruning and table caching, and creating a roadmap for further performance enhancements for the near future.
Two month ago, we announced the availability of an Amazon Machine Image (AMI) for a hands-on experience. Today, we even more proudly present you a Hive-on-Spark beta via CDH parcel. You can download that parcel here. (Please note that in this beta release only HDFS, YARN, Apache ZooKeeper, and Hive are supported. Other components, such as Apache Pig, Apache Oozie, and Impala, might not work as expected.) The “Getting Started” guide will help you get your Hive queries up and running on the Spark engine without much trouble.
We will update you again when GA is available. Stay tuned!
Xuefu Zhang is a software engineer at Cloudera and a Hive PMC member.