The following post was originally published by the Hue Team at the Hue blog in a slightly different form.
Hue, the open source web GUI that makes Apache Hadoop easy to use, has supported Cloudera Impala since its inception to enable fast, interactive SQL queries from within your browser. In this post, you’ll see a demo of Hue’s Impala app in action and explore its impressive query speed for yourself.
Impala App Demo
The demo below compares some queries across Hue’s Apache Hive and Impala applications. (Impala supports a broad range of SQL and HiveQL commands.) Although this comparison is not scientific, it does reflect general user experience across common cases.
In many ways, using Impala through the Hue app is easier than using it through the command-line impala-shell. For example, table names, databases, columns, and built-in functions are auto-completable, and a syntax highlighting feature shows potential typos in your queries. Multiple queries or a selected portion of a query can be executed from the editor. Parameterized queries are supported and you will be prompted for values at submission time. You can also save Impala queries, share them with other users, or delete them (and then restore them in case of mistakes).
Impala uses the same Metastore as Hive so you can browse tables with the Metastore app. You can also pick a database with a drop-down in the editor. After submission, progress and logs are reported and you can browse the result with infinite scroll or download the data with your browser.
Comparing Query Speeds
If you have Hue installed and want to compare query speed between Impala and Hive for yourself, start with the Hue examples. Although the queries are very small, they nonetheless illustrate the lightning speed of Impala.
To do that, confirm that the Hive and Impala examples are installed in Hue and then in each app, go to Saved Queries, copy the query Sample: Top salaries, and submit.
You can also try queries on more complex data — for example, from Yelp restaurants and reviews:
SELECT r.business_id, name, SUM(cool) AS coolness
FROM review r JOIN business b
ON (r.business_id = b.business_id)
WHERE categories LIKE '%Restaurants%'
AND `date` = '$date'
GROUP BY r.business_id, name
ORDER BY coolness DESC
You can see the benefits of Impala’s architecture and optimization first-hand! Results come back fast overall, and in the Yelp data case, instantaneously.
Impala and Hue combined are a recipe for fast analytics. Moreover, Hue’s Python API can also be reused if you want to build your own client.
Cloudera’s QuickStart VM with its Hadoop tutorials is a great way to get started with Impala and Hue. An upcoming blog post will describe how to use more efficient file formats in Impala.