Make the Leap to AI Driven Data Applications

The start of a new year is a perfect time to reflect on what was accomplished and look forward, re-evaluate what we can do better.  Change, although difficult at first, can also be very rewarding. That’s why I was excited to see similar sentiments shared at Thoughtspot beyond.2021 to move beyond the traditional dashboards of the past. As roles within organizations evolve (as seen by the growth of citizen scientists and analytics engineers) and as data needs change (think schema changes and real-time), we need more intelligent ways to perform visual exploration, data interrogation, and share insights. Dashboards often look in the rearview mirror, focusing on historical data and not on future insights – ie, predictive analytics. 

The explosion of new and more accessible ML tooling means there’s never been a better time to take the leap into predictive analytics than right now. 

Since the introduction of Cloudera Data Visualization (DV) back in Oct 2020, we’ve been focused on demonstrating the benefits of the expanded, self-service access to data analytics and predictive insights to all of our customers.  Democratizing data access breaks down silos and opens insights to any stage of the business operation.   Business users and analysts with subject matter expertise can tap into their own data domains to drive value where previously not possible due to lack of tooling or technical expertise. 

DV is natively integrated with Cloudera Data Platform (CDP), enabling self-service direct access to data from anywhere with the ability to quickly power visual data discovery and exploration across the entire analytical and machine learning lifecycle. Tight  integration with Cloudera Machine Learning (CML) allows users to take predictive insights built in CML and make them accessible through DV applications.

To show this in action, we will use the airline flights dataset to demonstrate some of the ways you can start incorporating predictive analytics in your visual applications. 

Jump start your journey with AMPs

Instead of starting from scratch, Applied ML Prototypes (AMPs) provides pre-built templates of many commonly used machine learning techniques such as time series forecasting, churn modeling, and anomaly detection.  In Cloudera Machine Learning (CML), users can bootstrap their projects by simply selecting one of the prototypes and filling out a few boxes. 

Figure: CML’s Applied ML Prototypes (AMPs)

For our flights dataset we will use the flight cancellation AMP as our starting point. The project generated by the AMP will predict cancellations. First, a simple configuration wizard can be used to set up the AMP-based project. Users can modify the default directories and runtime engines as needed.

Next, clicking on launch, the project will run through a series of steps from creating the project artifacts like the data and directories, all the way to training a prediction model and deploying it as a REST endpoint.  

This blueprint the AMP provides can be used to modify any aspect of the project including the model.  For example we can switch out the XGBoost classifier for another, making it easy to test out new models with minimal effort. 

Figure: Launch screen of the Flight Prediction AMP

Figure: AMP-based project with all artifacts deployed

Embed AI into your applications

Once we have our project setup and refined the ML classifiers per our needs, we are ready to deploy the model.  Models are deployed as REST endpoints such that any external (or internal) application can call to obtain prediction results.

Again CML makes this process simple.

Create the Predict Function

We use the flight cancellation model that was already setup by our AMP project and write a simple function that takes input variables (such as CARRIER, ORIGIN, DEST, WEEK, HOUR) and produces two outputs – the predicted cancellation and it’s associated confidence in terms of a  probability.  This function serves as a wrapper around the model, primarily used to translate the JSON payload from and to the invoking DV application, parsing input fields and outputting the prediction results. 

Figure: Wrapper predict function to be called by our DV application

Deploying the Function

Next we need to deploy our prediction function as a new REST endpoint. Since the AMP already did this we can simply replicate the same process. In deploying the function as a model, we need to make note of the URL along with the access key, these will be used in later steps.

Invoking the Model 

Once we have the model endpoint deployed we can invoke it from within our application.  DV makes this simple by providing an out of the box function (cviz_rest) that takes as input the model endpoint URL and access key along with input & output variables.







We create a new calculated column (“Cancellation Prediction”)  in our flight dataset using cviz_rest() in an expression.  The inputs will map to columns within our dataset – uniquecarrier, origin, dest, week, schdephr. And the response column will be the prediction results. These should all look familiar – they are the input and outputs of the predict function we created earlier. We are simply letting DV know what fields in our datasets should be used when invoking the REST endpoint.

Figure: Invoking model endpoint from DV

Final Application

With the dataset modeling complete, we can start creating our visul application to take advantage of the predictive insights. 

Here we have taken a tabular view and augmented it with our prediction.We have included the input columns (uniquecarrier, origin, dest, week, schdephr ) along with our calculated column “Cancellation Prediction” in our visualization. For each entry in the table, DV automatically invokes the model endpoint and displays the prediction results. 

And it’s easy to check the accuracy of our model with the actual data. We color code the model results and actual cancellation to make the visual comparison. It’s clear the model predictions are fairly accurate, giving us confidence in using it for operational planning for upcoming flights.

Figure: Fully Interactive and predictive application using Cloudera Data Visualization to monitor flight cancellations

Search your way to insights

Introduced early last year, the Natural Language Search in CDV allows users to ask questions of their data using a simple search bar. As the user types, CDV automatically sifts through search-enabled datasets, matching columns and keywords to visualizations to best fit the requested data elements. 

Top 10 airlines by flights” turns into a bar chart of the airlines with the largest number of flights.  While “Trend of flights” returns a time series graph showing total flights as a line.  The system intelligently applies heuristics to return what the user needs without resorting to a full blown visual builder.

Search is more appealing to users who are looking for quick insights.  It also helps lower the barrier to data access, without the need for training on a new tool or writing code. 

Figure : Interrogate your data in new ways – Cloudera Data Visualization’s Natural Language Search interface

Ready to take the leap?

Change can come in leaps or increments, and Cloudera Data Visualization gives you the flexibility to experiment, tweak, and learn how your business processes and users can benefit from AI driven data applications. It can be as simple as using the NLP search UI to for self-service exploration of explore new datasets or deploying a model to drive a fully interactive and predictive application.

We need to stop looking backwards for insights and 2022 is the perfect time to start looking forwards with AI driven applications.  To learn more about Cloudera Data Visualization sign up for a free trial and see it for yourself. And stay tuned for part 2 of the Make the Leap New Year’s resolution series as we explore hybrid deployments with Cloudera Data Engineering. 

Shaun Ahmadian
More by this author
Jon Ingalls
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.