Category Archives: Impala

From Zero to Impala in Minutes

Categories: Cloud Guest How-to Impala

This was post was originally published by U.C. Berkeley AMPLab developer (and former Clouderan) Matt Massie, on his personal blog. Matt has graciously permitted us to re-publish here for your convenience.

Note: The post below is valid for Impala version 0.6 only and is not being maintained for subsequent releases. To deploy Impala 0.7 and later using a much easier (and also free) method, use this how-to.

Read more

A Ruby Client for Impala

Categories: General Guest Impala

Thanks to Stripe’s Colin Marc (@colinmarc) for the guest post below, and for his work on the world’s first Ruby client for Cloudera Impala!

Like most other companies, at Stripe it has become increasingly hard to answer the big and interesting questions as datasets get bigger. This is pretty insidious: the set of potential interesting questions also grows as you acquire more data. Answering questions like, “Which regions have the most developers per capita?”

Read more

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read more

Meet the Engineer: Marcel Kornacker

Categories: Impala Meet the Engineer

Marcel Kornacker

In this installment of “Meet the Engineer”, meet Marcel Kornacker, the architect of the Cloudera Impala open-source real-time query engine for Apache Hadoop.

What do you do at Cloudera?

I’m a tech lead at Cloudera, working on the Cloudera Impala team. And although it’s not in my formal title, I’m also the architect of Impala. What that means in practice is that I have the very enviable but demanding job of not only creating Impala requirements,

Read more

External Hands-on Experiences with Cloudera Impala

Categories: Impala

The beta release of Cloudera Impala, the first (and open source) real-time query engine for Apache Hadoop, has been out in the wild (in binary as well as VM forms) for over a month now, and users have had time to get up-close and hands-on. Consequently, we’re beginning to see some fascinating self-published observations and guides.  

Here are just a few examples; you may know of more that we’ve missed:

Read more