New in Cloudera 5.14: Query Assistance improvements and ADLS integration for the Self Service Analytic Database

Categories: Analytic Database Cloud Hue Impala Search

Self-service BI and exploratory analytics are some of the most common use cases we see our customers running on Cloudera’s analytic database solution. Over the past year, we made significant advancements to provide a more powerful user experience for SQL developers and make them more productive for their everyday self-service BI tasks and workflows. Leveraging Hue as the SQL development workbench, we continue to see usage of the platform increase and the number of analytic use cases grow – with several thousands of weekly users of Hue running multiple million of queries.

With the recent release of Cloudera 5.14, we continued to improve the query experience with Hue – focusing on faster troubleshooting and unique query assistance capabilities to help prevent rogue queries. We also expanded our cloud integration – allowing users to discover and explore data directly in ADLS, as well as in AWS S3, HDFS, and Apache Kudu. Read on to learn more about these and try it out in one-click at demo.gethue.com.

Demo use case: self service Customer 360

A good summary of the capability of the Analytic DB and the improvements of Cloudera 5.14 are showcased in this demo. In this Self Service analytics Customer 360 scenario, the Editor is used to query credit card transaction data that is saved in an object store in the cloud and in an Apache Kudu table. The demos leverages the Data Catalog search and tagging as well as the Query Assistant. 

Analytic DB

 

Data Catalog Search

Before typing any query to get insights, users need to find and explore the correct datasets. The Data Catalog search was introduced in Cloudera 5.11 and its usability experience keeps improving. It is now available in the top bar of the interface and offers free text search of SQL tables, columns, tags and saved queries. This is particularly useful for quickly looking up a table among thousands or finding existing queries already analysing a certain dataset.

Some example of searches:

  • table:customer → Find the customer table
  • table:tax* tags:finance → List all the tables starting with tax and tagged with ‘finance’Data Catalog Search

Syntax Checker

When the data is found, writing the correct query is not always an easy task. Here are some improvements that will facilitate it.

The Editor provides close to 100% of the Apache Impala and Hive syntax, allowing fast typing of SQL queries. This release introduces the syntax checker that will highlight wrong syntax or SQL objects that do not exist. It can also suggest simple fixes. These warnings make typing queries more productive, as the visual checks happen before the query submission.

Example of a wrong FROM syntax highlighted on line 2

Example of a wrong FROM syntax highlighted on line 2

Another one showing suggestions

Another one showing suggestions

Where globalname is a non-existing column

Where globalname is a non-existing column

Risk alerts with recommendations

Risk alerts and popular values were introduced in Cloudera 5.12 in order to assist the SQL users in writing quicker and more efficient queries. Now suggestions on how to fix the alerts are available directly within the editor. By clicking on the risk alert, a list of suggestions will be proposed, and in one click the query will be automatically modified to be safer or more performant.

Example of a risk alert on querying a table without specifying any partitioning to restrict the size of data in input

Clicking on "Add Filter" will propose a list of popular filter values to be injected directly into the SQL query

Clicking on “Add Filter” will propose a list of popular filter values to be injected directly into the SQL query

Impala Query Browser

After finding the data and having written the queries, we’re are not done yet. Troubleshooting failed or long-running queries is also a key part of an efficient workflow.

Image of the built-in query information without leaving the Editor page and context

Image of the built-in query information without leaving the Editor page and context

The new Impala Query Browser provides built-in troubleshooting for queries, in particular about these key metrics.

Profiles

Display the profile which gives you physical execution of the query in great detail. This view is used to analyze data exchange between the various operator and the performance of the IO (disk, network, CPU). You can use this to reorganize the location of your data (on disk, in memory, different partitions or file formats).

Plans

The query plan is a condensed version of the summary report in graphical form. It gets updated on real time.

Visual representation of the operations of a query

Visual representation of the operations of a query

Memory consumptions

To determine if the memory available to your query is sufficient, it provides the memory profile – containing information about the memory usage during the execution of the query.

Read more about it in this blog post.

Variables

The Editor now lets you define default values of variables, so that copy/pasting or building reports remains easy. A typical use case is then to share a set of queries in the Data Catalog so that the knowledge can be re-used and tweaked by other teams.Editing country_code

List of values also now offer a simple way to parameterize variables in order to build reports that can be reused by a broader range of users.List of country_code values

It is even possible to provide an alternative text to the values:

Solr Dashboard autocomplete

Not only are SQL users able to efficiently discover and analyze data, but the platform also supports interactive Apache Solr Search, merging text search capabilities within BI and analytic workflows.

Solr is an engine powering the dynamic Dashboards. One advantage of Solr is to allow very quick access of matching values of data, even in huge datasets. In this version, the search bar autocomplete provides an interactive matching of values to filter on as you type.

Live filtering of a list of thousand of wine types

Read more about it in this blog post.

Cloud

Cloud is increasingly popular for powering analytic workloads such as data preparation and data warehousing. Regardless of where data lives, Hue lets you explore, query, and share results across HDFS, Kudu, AWS S3, and now the Microsoft Azure ADLS filesystem.

In particular, it provides functionalities that makes data analysis in the Cloud more productive:

  • Exploring ADLS in file browser
  • Create Hive Tables Directly From ADLS
  • Save Query Results to ADLS

Hue file browsersThe integration is easy to plug and play.

  1. Add the ADLS credentials in Cloudera Manager:Adding ADLS credentials in CM
  2. Then the ADLS connector will auto configure the cluster components likes Impala, Hive, Hue:ADLS connector auto configure

Read more about it in this blog post.

We hope that this new version of the Analytic DB interface makes self-service data discovery and analytics easier and faster. If you have any questions or feedback, feel free to comment here, on the community forum or via @gethue!

 

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *

Prove you're human! *