Category Archives: Parquet

Impala Performance Update: Now Reaching DBMS-Class Speed

Categories: General Hive Impala Parquet

Impala’s speed now beats the fastest SQL-on-Hadoop alternatives. Test for yourself!

Since the initial beta release of Cloudera Impala more than one year ago (October 2012), we’ve been committed to regularly updating you about its evolution into the standard for running interactive SQL queries across data in Apache Hadoop and Hadoop-based enterprise data hubs. To briefly recap where we are today:

  • Impala is being widely adopted.

Read more

Parquet at Salesforce.com

Categories: Guest Impala Parquet

The following Parquet blog post was originally published by Salesforce.com Lead Engineer and Apache Pig Committer Prashant Kommireddi (@pRaShAnT1784). Prashant has kindly given us permission to re-publish below. Parquet is an open source columnar storage format co-founded by Twitter and Cloudera.

Parquet is a columnar storage format for Apache Hadoop that uses the concept of repetition/definition levels borrowed from Google Dremel.

Read more

Announcing Parquet 1.0: Columnar Storage for Hadoop

Categories: Community Guest Hadoop Impala Parquet

We’re very happy to re-publish the following post from Twitter analytics infrastructure engineering manager Dmitriy Ryaboy (@squarecog).

In March we announced the Parquet project, the result of a collaboration between Twitter and Cloudera intended to create an open-source columnar storage format library for Apache Hadoop.

Today, we’re happy to tell you about a significant Parquet milestone: a 1.0 release, which includes major features and improvements made since the initial announcement.

Read more