Author Archives: Carl Steinbach

Coming Attractions: Apache Hive 0.8.0

Categories: General Hadoop Hive

The Apache Hive team is hard at work putting the finishing touches on the 0.8.0 release. While the release hasn’t reached the GA milestone yet, I think now would be a good time to start highlighting some of the new features and improvements that users can expect to find in this important update:

Bitmap Indexes

The infrastructure required to support table indexes was originally added in the 0.7.0 release, but at the time no viable indexing plugin was provided.

Read More

What’s New in CDH3b2: Apache Hive

Categories: General Hive

CDH3 beta 2 includes Apache Hive 0.5.0, the latest version of the popular open source Apache Hadoop data warehouse platform. Hive allows you to express data analysis tasks in a dialect of SQL called HiveQL, and then compiles these tasks into MapReduce jobs and executes the jobs on your Hadoop cluster. Hive is a natural entry point to Hadoop for people who have prior experience with relational databases,

Read More

What’s New in CDH3b2: Pig

Categories: General Pig

CDH3 beta 2 includes Apache Pig 0.7.0, the latest and greatest version of the popular dataflow programming environment for Hadoop. In this post I’ll review some of the bigger changes that went into Pig 0.7.0, describe the motivations behind these changes, and explain how they affect users. Readers in search of a canonical list of changes in this new version of Pig should consult the Pig 0.7.0 Release Notes as well as the list of backward incompatible changes.

Read More

Integrating Apache Hive and Apache HBase

Categories: Guest HBase Hive

This post was contributed by John Sichi, a committer on the Apache Hive project and a member of the Data Infrastructure team at Facebook.

As many readers may already know, Hive was initially developed at Facebook for dealing with explosive growth in our multi-petabyte data warehouse.  Since its release as an Apache project, it has been put into use at a number of other companies for solving big data problems.   Read More