Congratulations to Parquet, Now an Apache Incubator Project
In its relatively short lifetime (co-founded by Twitter and Cloudera in July 2013), Parquet has already become the de facto standard for columnar storage of Apache Hadoop data — with native support in Impala, Apache Hive, Apache Pig, Apache Spark, MapReduce, Apache Tajo, Apache Drill, Apache Crunch, and Cascading (and forthcoming in Presto and Shark). Parquet adoption is also broad-based, with employees of the following companies (partial list) actively contributing:
- ARRIS Enterprises
- UC Berkeley AMPLab
…and usage of Parquet commercially supported by Cloudera, IBM, MapR, and Pivotal!
With this news, I thought it would be a good time to recap our coverage of Parquet to date for those of you in catch-up mode:
- How-to: Convert Existing Data into Parquet (May 19, 2014)
Learn how to convert your data to the Parquet columnar format to get big performance gains.
- Using Impala at Scale at Allstate (May 15, 2014)
A guest post about Allstate Insurance’s experiences with Impala and Parquet.
- Using Apache Hadoop and Impala with MySQL for Data Analysis (April 25, 2014)
Alexander Rubin of Percona describes a MySQL/Impala/Parquet use case.
- How-to: Use Parquet with Impala, Hive, Pig, and MapReduce (March 21, 2014)
The CDH software stack lets you use your tool of choice with the Parquet file format – – offering the benefits of columnar storage at each phase of data processing.
- Native Parquet Support Comes to Apache Hive (February 20, 2014)
Bringing Parquet support to Hive was a community effort that deserves congratulations!
- Impala Performance Update: Now Reaching DBMS-Class Speed (January 13, 2014)
Impala’s speed (on Parquet) now beats the fastest SQL-on-Hadoop alternatives. Test for yourself!
- Parquet at Salesforce.com (Oct. 22, 2013)
Salesforce.com Lead Engineer and Pig committer Prashant Kommireddi describes the Parquet use case at Salesforce.com.
- Announcing Parquet 1.0: Columnar Storage for Hadoop (July 30, 2013)
The initial announcement from Twitter analytics infrastructure engineering manager Dmitriy Ryaboy.
Congrats again, Parquet People!
Justin Kestelyn is Cloudera’s developer outreach director.