Category Archives: Impala

Inside Cloudera Impala: Runtime Code Generation

Categories: Hadoop Hive Impala

Cloudera Impala, the open-source real-time query engine for Apache Hadoop, uses many tools and techniques to get the best query performance. This blog post will discuss how we use runtime code generation to significantly improve our CPU efficiency and overall query execution time. We’ll explain the types of inefficiency that code-generation eliminates and go over in more detail one of the queries in the TPCH workload where code generation improves overall query speeds by close to 3x.

Read more

From Zero to Impala in Minutes

Categories: Cloud Guest How-to Impala

This was post was originally published by U.C. Berkeley AMPLab developer (and former Clouderan) Matt Massie, on his personal blog. Matt has graciously permitted us to re-publish here for your convenience.

Note: The post below is valid for Impala version 0.6 only and is not being maintained for subsequent releases. To deploy Impala 0.7 and later using a much easier (and also free) method, use this how-to.

Read more

A Ruby Client for Impala

Categories: General Guest Impala

Thanks to Stripe’s Colin Marc (@colinmarc) for the guest post below, and for his work on the world’s first Ruby client for Cloudera Impala!

Like most other companies, at Stripe it has become increasingly hard to answer the big and interesting questions as datasets get bigger. This is pretty insidious: the set of potential interesting questions also grows as you acquire more data. Answering questions like, “Which regions have the most developers per capita?”

Read more

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read more

Meet the Engineer: Marcel Kornacker

Categories: Impala Meet the Engineer

Marcel Kornacker

In this installment of “Meet the Engineer”, meet Marcel Kornacker, the architect of the Cloudera Impala open-source real-time query engine for Apache Hadoop.

What do you do at Cloudera?

I’m a tech lead at Cloudera, working on the Cloudera Impala team. And although it’s not in my formal title, I’m also the architect of Impala. What that means in practice is that I have the very enviable but demanding job of not only creating Impala requirements,

Read more