Change The Way You Do ML With Applied ML Prototypes

Posted in Business | February 25, 2021 6 min read

Today’s enterprise data science teams have one of the most challenging, yet most important roles to play in your business’s ML strategy. In our current landscape, businesses that have adopted a successful ML strategy are outperforming their competitors by over 9%. The implications of ML on the future of business are clear. However, only 4% of enterprise executives today report seeing success from their ML investment. While there are many factors that can contribute to this inefficiency, one of the most prevalent hurdles to overcome has to do with simply getting projects off the ground and selecting the right approaches, algorithms, and applications that will lead to fast results and trustworthy decision making.

Cloudera has a front-row seat to organizational challenges as those enterprises make Machine Learning a core part of their strategies and businesses. With almost all of the Fortune 500 and a majority of the Global 2000 relying on Cloudera for their most important data assets, Cloudera’s Machine Learning product (CML) is the way enterprises do ML. We work with the largest companies in the world to help tackle their most challenging ML problems.

To directly address these challenges, we’ve released Applied ML Prototypes (AMPs) — a revolutionary new way of developing and shipping enterprise ML use cases — which provide complete ML projects that can be deployed with one click directly from Cloudera Machine Learning. AMPs enable data scientists to go from an idea to a fully working ML use case in a fraction of the time, with an end-to-end framework for building, deploying, and monitoring business-ready ML applications instantly.

AMPs move the starting line for any ML project by enabling data scientists to start with a full end-to-end project developed for a similar use case, including a trained and deployed ML model, as well as prebuilt predictive business applications, out of the box. This means that ML development teams can tackle their own ML business use cases more quickly, from those involving churn modeling, to sentiment analysis, to anomaly detection and beyond.

Applied ML Prototypes

AMPs capture industry-specific, use-case-specific, and application-specific best practices to take the guesswork out of ML projects. Additionally, the AMPs catalog within CML is completely customizable, enabling your data science teams to build and securely share internal projects as AMPs. This means it’s easier than ever to share organizational knowledge and deliver more repeatable solutions with best practices built-in.

AMPs are a revolutionary way to accelerate your ML initiatives

The work of a machine learning model developer is highly complex. They need strong data exploration and visualization skills, as well as sufficient data engineering chops to fix the gaps they find in their initial study. They require a deep enough knowledge of dozens of ML techniques in order to choose the right approach for a given use case, a thorough understanding of everything required to execute on that use case, as well as a solid foundation in statistics fundamentals to ensure their choices and implementations are mathematically sound and appropriate. These are core skills for any data scientist or model developer.

But as many organizations can attest, developing good models is only half the battle. Delivering ML models into the business with the right production ML tooling — including deployment, monitoring, and governance — is often the bigger challenge. Then, of course, there are ethical and legal responsibilities of ensuring model predictions can be understood by humans, and that models have not learned problematic biases, either directly or indirectly.

Many techniques, frameworks, tools, and libraries have been developed to address these various challenges, but learning how each of these tools works and how it can integrate with your project provides a time-consuming barrier to taking successful ML projects to production.

AMPs are reference examples including the combined work of data engineers, data scientists, and application developers on a given use case, lowering the barrier to entry for ML development teams, by allowing everyone across the ML lifecycle to see the overall approach taken for a specific type of use case, or by demonstrating use of a specific technique, tool or library.

AMPs help everyone from junior data scientists, who can follow an AMP end-to-end to guide them in building a solution for a similar use case, to the most senior ML developers, who can save time by seeing how to use a particular library they have not used before for model interpretability, for example.

Built By Experts At The Leading Edge Of ML innovation

AMPs are produced by Cloudera Fast Forward Labs, a team of ML/AI Research Engineers focused on making the recently possible in machine learning practical and usable by business today. CFFL has published almost two dozen research reports, each accompanied by detailed prototypes demonstrating the capabilities they report on. These reports are being open-sourced for the first time, and with AMPs, the deep knowledge and expertise of the research team is now available to your data scientists directly in CML.

AMPs highlight the latest best practices and cutting-edge techniques and technologies for machine learning projects, making them high-quality reference examples with which organizations can jumpstart their own projects — many of which would be very difficult to productionalize otherwise.

Applied ML Prototypes Available To You Today!

10 AMPs are available for use out of the box today, with dozens more coming in the next few months. Here’s a preview of what you can leverage with one click in CML:

Deep Learning for Anomaly Detection

Apply modern, deep learning techniques for anomaly detection to identify network intrusions. This AMP benchmarks multiple state of the art algorithms, with a front end for comparing their performance.

Deep Learning for Image Analysis

Build a semantic search application with deep learning models. The project launches an interactive visualization for exploring the quality of representations extracted using multiple model architectures.

NeuralQA

Launch a visual interface for question answering that supports BERT models and information retrieval methods.

Analyzing News Headlines with SpaCy

Detect organisations being mentioned in Reuters headlines, using SpaCy for named entity extraction. This notebook also demonstrates several downstream analyses.

Structural Time Series

Use an interpretable approach to forecasting electricity demand data for California. The AMP implements both a model diagnostic app and a small forecasting interface that allows asking smart, probabilistic questions of the forecast.

Churn Modeling with scikit-learn

Build a scikit-learn model to predict churn using customer telco data, and interpret each prediction with LIME.

MLflow for Experiment Tracking

MLflow’s experiment tracking capabilities offer a low-friction way of tracking model hyperparameters and metrics across many experiments. This AMP shows you how to get started with MLflow in CML.

Explaining Models with LIME and SHAP

Interpretability is an important step in the data science workflow. Being able to explain how a model works serves many purposes, including building trust in the model’s output, satisfying regulatory requirements, model debugging, and verifying model safety, amongst other things. This AMP shows you how to leverage two industry-standard algorithms for interpretability.

Active Learning

Supervised machine learning, while powerful, needs labeled data to be effective. Active learning reduces the number of labeled examples needed to train a model, saving time and money while obtaining comparable performance to models trained with much more data. This project launches an interactive visual workflow of active learning using the MNIST dataset.

Deep Learning for Question Answering

Build an end-to-end extractive question answering system with this AMP that features applications for IR-QA, model exploration, and data visualization.

Building The Future Of Enterprise ML Today

At Cloudera, we enable the top global enterprises in everything from banking, to telecommunications, to manufacturing, to deliver results quickly. With this release of AMPs, we’re only scratching the surface of what’s possible for businesses to do in the near future. And moving forward we expect to add hundreds of AMPs across numerous industries and use cases. AMPs will enable your business to fundamentally change the way ML projects are built and delivered — leading to faster adoption, greater scale, and improved ROI. Stay tuned!

To learn more about Applied Machine Learning Prototypes in Cloudera Machine Learning (CML), join us for the webinar Jumpstart AI Use Cases With Applied ML Prototypes.

Alex Bleakley

More by this author

Santiago Giraldo

More by this author

Editor's Choice

Business

Generative AI for the Enterprise

Technical

Building Trust in Public Sector AI Starts with Trusting Your Data