How-to: Use Impala on Amazon EMR

Categories: Cloud How-to Impala

Developers, rejoice: Impala is now available on EMR for testing and evaluation.

Very recently, Amazon Web Services announced support for running Cloudera Impala queries on its Elastic MapReduce (EMR) service. This is very good news for EMR users — as well as for users of other platforms interested in kicking Impala’s tires in a friction-free way. It’s also yet another sign that Impala is rapidly being adopted across the ecosystem as the gold standard for interactive SQL and BI queries on Apache Hadoop.

Impala on EMRAWS also helpfully provides a tutorial for launching and querying Impala clusters on EMR. It covers:

  • Signing up for Amazon EMR, 
  • Launching a cluster with Impala installed,
  • Connecting to the cluster using SSH,
  • Generating a test data set,
  • Creating Impala tables and populating them with data, and
  • Performing interactive queries on Impala tables

You may also want to read the AWS FAQ about Impala on EMR.

Developers, start your clusters! Impala is now on EMR for easy testing and evaluation.

Justin Kestelyn is Cloudera’s developer outreach director.