Congratulations to Parquet, Now an Apache Incubator Project

Yesterday, Parquet was accepted into the Apache Incubator. Congratulations to all the contributors to what will eventually become Apache Parquet!

In its relatively short lifetime (co-founded by Twitter and Cloudera in July 2013), Parquet has already become the de facto standard for columnar storage of Apache Hadoop data — with native support in Impala, Apache Hive, Apache Pig, Apache Spark, MapReduce, Apache Tajo, Apache Drill, Apache Crunch, and Cascading (and forthcoming in Presto and Shark). Parquet adoption is also broad-based, with employees of the following companies (partial list) actively contributing:

  • ARRIS Enterprises
  • Cloudera
  • Criteo 
  • Netflix
  • Salesforce.com
  • Seznam
  • Stripe
  • Twitter
  • UC Berkeley AMPLab

…and usage of Parquet commercially supported by Cloudera, IBM, MapR, and Pivotal!

With this news, I thought it would be a good time to recap our coverage of Parquet to date for those of you in catch-up mode:

Congrats again, Parquet People! 

 Justin Kestelyn is Cloudera’s developer outreach director.

Filed under:

1 Response
  • Ravi C / October 14, 2014 / 7:16 AM

    Good news! Thanks.
    Currently column level compression is not supported, but file level compression can be specified. Do you know, when column level compression is supported in the future, will file level compression be deprecated or support for it will continue to exist?

Leave a comment


eight × = 56