Enterprise Data Science Workflows with AMPs and Streamlit

by Chris Wallace

Posted in Business | July 27, 2021 3 min read

Here in the virtual Fast Forward Lab at Cloudera, we do a lot of experimentation to support our applied machine learning research, and Cloudera Machine Learning product development. We believe the best way to learn what a technology is capable of is to build things with it. Only through hands-on experimentation can we discern truly useful new algorithmic capabilities from hype.

Some examples of our recent experiments:

Building a natural language question answering interface to Wikipedia
Fitting Prophet models with complex seasonalities for electricity demand forecasting
Exploring how inference works in RetinaNet for object detection

Understanding the technologies underlying these examples – both what they can do, and how they work – relied heavily on exploration and visualization.

The entirely custom front-end to one of our prototype applications with a probabilistic model of NPC real estate.

We have a history of building out full-featured front-ends for our prototypes, like NeuralQA, ConvNet Playground, and Probabilistic Real Estate. These fleshed-out web applications are representative end products of data science work. They’re outward facing, something polished that could be presented to enterprise users. Recently, we’ve been bringing these front-ends to the Cloudera Machine Learning, with applied machine learning prototypes (AMPs). AMPs accelerate machine learning projects and kickstart AI use cases by providing example workflows and applications that leverage the power of the platform.

Not every project requires a fully custom web app. Ines Montani of Explosion wrote How front-end development can improve data science in 2016, and, five years later, those words still ring true. There are many uses for interactive applications in the machine learning development lifecycle. Not all of them require a unique front-end. This is fortunate, because few data scientists are web developers on the side. When exploring a new and challenging data science problem, development speed and rapid iteration cycles reign supreme.

We’ve found that Streamlit hits a sweet spot for “primarily Python” data scientists. With just a short Python script, we can whip up an interactive web application, directly connected to the data and models in our Python session, and easily serve this as an Application on Cloudera’s CML platform. The pure-Python nature of Streamlit grants ease of use and familiarity, while being flexible enough to build out most of what we need for exploratory work (and indeed, you can write custom front-end components if you have the skills and inclination).

Streamlit allows us to rapidly build interfaces to our models, and is the end point of several of our AMPs:

We can prototype user facing applications, suitable for internal tools, as in Deep Learning for Question Answering.
We can build assistive diagnostic interfaces for model building, as in Structural Time Series.
We can explore model behaviour, as in Few-Shot Text Classification.
Or we can write interactive, explanatory content, as in Object detection inference visualized.

To make it easy for you to get started incorporating Streamlit as part of your enterprise data science workflow in CML, we created a small starter application. Clone it here, Streamlit on CML, or find it in the AMP tab of your CML install!

The application in our minimal Streamlit on CML starter kit.

To learn how to rapidly create and deploy ML models in web apps in a fraction of the time, register for our webinar: “Automating Sharable AI Web Apps with Streamlit and Cloudera”.

Chris Wallace

Research Lead at Cloudera Fast Forward

More by this author