The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. This year ACM SIGMOD/PODS will be held in Amsterdam, The Netherlands on June 30th – July 5th, 2019, and Cloudera will be present in the conference, contributing to and learning from the broader research community.

Last year, Apache Hive was recognized with the SIGMOD Software Systems Award “for developing seminal software systems that served to bring relational-style declarative programming to the Hadoop ecosystem” during the conference. In fact, Apache Hive’s early success stemmed from the ability to exploit parallelism for batch operations with a well-known SQL-like interface. It made data load and management simple, handling node, software and hardware failures gracefully without expensive repair or recovery times.

At this year’s edition, we are presenting a paper that describes the key innovations on Apache Hive’s journey from batch processing tool to fully fledged SQL enterprise data warehousing system. In particular, we describe how our team, as part of the Apache Hive community, has expanded the utility of the system over the years by (i) adding row level transactional capabilities required for data modifications in star schema databases, (ii) introducing optimization techniques that are useful to handle today’s view hierarchies and big-data operations, (iii) implementing runtime improvements necessary to bring query latency and concurrency into the realm of interactive operation, and (iv) laying the groundwork for using Apache Hive as a relational front-end to multiple storage and data systems. All these enhancements were introduced without ever compromising on the original characteristics of the system that made it popular. We also provide an outlook on the exciting roadmap for the project.

If you are attending the conference, we hope you will get in touch with us to discuss our work and also opportunities to contribute to Apache Hive and other projects in this thrilling open-source ecosystem. See you in Amsterdam!

Paper: Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Jesús Camacho Rodríguez, Ashutosh Chauhan, Alan Gates, Eugene Koifman, Owen O’Malley, Vineet Garg, Zoltan Haindrich, Sergey Shelukhin, Prasanth Jayachandran, Siddharth Seth, Deepak Jaiswal, Slim Bouguerra, Nishant Bangarwa, Sankar Hariappan, Anishek Agarwal, Jason Dere, Daniel Dai, Thejas Nair, Nita Dembla, Gopal Vijayaraghavan, and Günther Hagleitner

Jesus Camacho Rodriguez

Principal Software Engineer I

Slim Bougerra

Staff Software Engineer

Leave a comment

Your email address will not be published. Links are not permitted in comments.