Justin Kestelyn, Author at Cloudera Blog

April 22, 2016 | Technical

Benchmarking Apache Parquet: The Allstate Experience

Our thanks to Don Drake (@dondrake), an independent technology consultant who is currently working at Allstate Insurance, for the guest post below about his experiences comparing use of the Apache Avro and Apache Parquet file formats with Apache Spark. Over the last few months, numerous hallway conversations, informal discussions, and meetings have occurred at Allstate […]

by Justin Kestelyn 6 min read

August 22, 2014 | Technical

Improving Query Performance Using Partitioning in Apache Hive

Our thanks to Rakesh Rao of Quaero, for allowing us to re-publish the post below about Quaero’s experiences using partitioning in Apache Hive. In this post, we will talk about how we can use the partitioning features available in Hive to improve performance of Hive queries. Partitions Hive is a good tool for performing queries […]

by Justin Kestelyn 3 min read

Apache Hive

More by this author:

Benchmarking Apache Parquet: The Allstate Experience

Improving Query Performance Using Partitioning in Apache Hive