Accelerating Projects in Machine Learning with Applied ML Prototypes

by Cloudera

Posted in Technical | October 26, 2022 4 min read

It’s no secret that advancements like AI and machine learning (ML) can have a major impact on business operations. In Cloudera’s recent report Limitless: The Positive Power of AI, we found that 87% of business decision makers are achieving success through existing ML programs. Among the top benefits of ML, 59% of decision makers cite time savings, 54% cite cost savings, and 42% believe ML enables employees to focus on innovation as opposed to manual tasks.

Data practitioners are at the top of the list of employees who are now able to put more focus on innovation.

Cloudera has seen a lot of opportunity to extend even more time saving benefits specifically to data scientists with the debut of Applied Machine Learning Prototypes (AMPs). These AMPs help kickstart projects in machine learning by providing working examples of how to solve common data science use cases, enabling data scientists to move faster and focus more time on driving further innovation.

What are AMPs and why do they help?

AMPs are fully built end-to-end data science solutions that allow data scientists to go from an idea to a fully working machine learning solution in a fraction of the time. Accessible with a single click from Cloudera machine learning or via public GitHub repositories, AMPs provide an end-to-end framework for building, deploying, and monitoring business-ready ML applications.

AMPs were born from the observation that data scientists very rarely start a new project from scratch. The pattern that we most often observe is that after a data scientist understands the problem and the data that they have to work with, they search the internet to find an example of something similar to what they are trying to accomplish. Unfortunately, this pattern of development has some significant drawbacks: (1) a lack of visibility into the author’s credibility; (2) there’s no guarantee that the code you find uses current best practices; and (3) it’s unknown whether the libraries used will work in your current environment.

AMPs are the solution to this age-old (well, 21st-Century old) problem. Every AMP was built by a member of Cloudera’s ML research group, Fast Forward Labs. Each AMP goes through a rigorous review process by some of the brightest and credible ML minds. AMPs are periodically reviewed and updated to ensure that methods and libraries are up to date. Lastly, each AMP ships with a requirements file so that a clean and consistent environment can be deployed with the correct dependencies.

For anyone who might be thinking, “If you’re releasing complete machine learning projects, aren’t you already doing the data scientist’s job for them?” The answer is a resounding no. These AMPs absolutely provide a starting point and allow data scientists to have a bit of a head start on their project, but they still require coding and iterations to fit the specific use case. By rolling out AMPs, we’re helping large organizations accelerate past the deployment hump that often occurs, despite large initial investments in ML.

What AMPs exist today, and what’s coming down the pipe?

The Fast Forwards Labs team has developed and released more than a dozen AMPs to date with more to come. AMPs so far include:

Deep Learning for Anomaly Detection: Apply modern, deep learning techniques for anomaly detection to identify network intrusions. This AMP benchmarks multiple state-of-the-art algorithms, with a front-end web application for comparing their performance.
Deep Learning for Image Analysis: Build a semantic search application with deep learning models. The project launches an interactive visualization for exploring the quality of representations extracted using multiple model architectures.
- This AMP can also be repurposed to help you find the most unique snowflake.
Analyzing News Headlines with SpaCy: Detect organizations being mentioned in Reuters headlines using SpaCy for named entity extraction. This notebook also demonstrates several downstream analyses.
Structural Time Series: Use an interpretable approach to forecasting electricity demand data for California. The AMP implements both a model diagnostic app and a small forecasting interface that allows asking smart, probabilistic questions of the forecast.
Distributed XGBoost with Dask: This AMP is one of our newest and was prioritized due to several quests from customers. It provides a Jupyter Notebook that demonstrates a typical data science workflow for detecting fraudulent credit card transactions by training a distributed XGBoost model in conjunction with Dask, a library for scaling Python applications using the CML Workers API.
And arguably, the most critical AMP to date: Finding Halloween candy surplus.

We are still hard at work on some new AMPs, too. One much-anticipated, soon-to-be-released AMP is another flavor of distributing Python workloads, this time with Ray. Much like Dask, Ray is a unified framework for scaling AI and Python applications. This AMP will give practitioners an example of another way to distribute their data science workloads.

How are AMPs benefiting companies?

The biggest benefit of AMPs is the ability to fast track adoption of machine learning. For one biotech company, the Streamlit AMP helped to get new apps in their tenant, enabling their data scientists to communicate results with business users. They also used the Churn Prediction demo for onboarding, as a reference of ML and Python best practices. Companies also rely on AMPs like continuous model monitoring to improve their MLOps capabilities. For other use cases, like natural language processing (NLP), we have a number of AMPs that can help.

AMPs are great demonstration tools for practitioners to use during conversations with their internal stakeholders, proofs of concept, and workshops. They are a great way to demonstrate value and pave the way for quick wins with machine learning. They are available immediately to download from GitHub. If you’d like to talk to us about how to do more with your machine learning (contact info/link here).

AMP hackathon

If this blog inspired you to try your hand at creating your own AMP, then we’ve got just the thing for you. Cloudera, along with AMD, is sponsoring a hackathon where participants are tasked with creating their own unique applied ML prototype. Winning entrants will receive a cash prize, and their projects will be reviewed by Cloudera Fast Forward Labs and added to the AMP Catalog.

If you have a project that you would love to share with the community, are looking to differentiate your resume from the masses, and/or could use some extra cash, then sign up for your chance to win!

Cloudera

More by this author

Editor's Choice

Business

Acquisition of Verta’s Operational AI Platform Will Transform Cloudera’s AI Vision to Reality