Integrating Machine Learning Models into Your Big Data Pipelines in Real-Time With No Coding

Integrating Machine Learning Models into Your Big Data Pipelines in Real-Time With No Coding

[Editor’s note: This article was originally published on the Hortonworks Community Connection, but reproduced here because CDSW is now available on both Cloudera and Hortonworks platforms.]

Using Deployed Models as a Function as a Service

104409 dataengineering 104410 datascience 104431 flowmanagement

Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_hdp.html),  but it will work for all CDSW regardless of install type.

In my simple example, I built a Python model that uses TextBlob to run sentiment analysis against a passed-in sentence. It returns Sentiment Polarity and Subjectivity, which we can immediately act upon in our flow.

CDSW is extremely easy to work with and I was up and running in a few minutes. For my model, I created a python 3 script and a shell script for install details. Both of these artifacts are available here: https://github.com/tspannhw/nifi-cdsw.

My Apache NiFi 1.8 flow is here (I use no custom processors): cdsw-twitter-sentiment.xml.

104433 socialmediacdswarch

Deploying a Machine Learning Model as a REST Service

104419 installtextblobcorpora

Once you login to CDSW, you may create a project or choose an existing one (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_projects.html). From your project, open workbench and you can install some libraries and test some Python. I am using a Python 3 session to download the TextBlob/NLTK Corpora for NLP.

104425 buildamodel

Let’s Pip Install Some Libraries for Testing

104427 pipinstalltextblob

Let’s Create a New Model

104426 modeldeploy1

You choose your file (mine is sentiment.py; see github). The function name is actually sentiment. Notice a typo I had to rebuild this and deploy. You set up an example input (sentence is the input parameter name) and an example output. Input and output will be JSON since this is a REST API.

Let’s Deploy It (Python 3)

104428 modeldeploy2

After clicking the blue Deploy Model button, you will be able to view additional information, including standard output, standard error, status, number of REST calls received, and success.

Once a Model is Deployed We Can Control It

104424 cdswmodels

We can stop it, rebuild it or replace the files if need be. I had to update things a few times. The amount of resources used for the model rest hosting if your choice from a drop down. Since I am doing something small I picked the smallest model with only 1 virtual CPU and 2 GB of RAM. All of this is running in Docker on Kubernetes!

Once Deployed, It’s Ready To Test and Use From Apache NiFi

104430 sentimentanalysismodeloverview

Click test. See the JSON results, and we can now call it from an Apache NiFi flow.

Once Deployed We Can Monitor The Model

104422 cdswmonitoring

Let’s Run the Test

104420 cdswsentimentmodeltesting

See the status and response!

Apache NiFi Example Flow

104404 cdswflow1104405 cdswflow2

Step 1: Call Twitter

104407 gettwitterfeed

Step 2: Extract Social Attributes of Interest

104411 parsetwitter

Step 3: Build our web call with our access key and function parameter

104412 buildcdswcall

Step 4: Extract our string as a flow file to send to the HTTP Post

104413 extractaccesskey2

104414 extractaccesskey

Step 5: Call Our Cloudera Data Science Workbench REST API (see tester).

104415 posttocdsw

Step 6: Extract the two result values.

104416 parsingresultfromcdsw

Step 7: Let’s route on the sentiment

104417 routingsentimentpolarity

We can have negative (<0), neutral (0), positive (>0) and very positive (1) polarity of the sentiment. See TextBlob for more information on how this works.

Step 8: Send bad sentiment to a Slack channel for human analysis.

104406 putslack

We send all the related information to a Slack channel, including the message.

Example Message Sent to Slack

104434 slackmsg

Step 9: Store all the results (or some) in either Phoenix/HBase, Hive LLAP, Impala, Kudu or HDFS.

Results as Attributes

104423 polarityreturned

Slack Message Call

${msg:append(" User:"):append(${user_name}):append(${handle}):append(" Geo:"):append(${coordinates}):append(${geo}):append(${location}):append(${place}):append(" Hashtags:"):append(${hashtags}):append(" Polarity:"):append(${polarity}):append(" Subjectivity:"):append(${subjectivity}):append(" Friends Count:"):append(${friends_count}):append(" Followers Count:"):append(${followers_count}):append(" Retweet Count:"):append(${retweet_count}):append(" Source:"):append(${source}):append(" Time:"):append(${time}):append(" Tweet ID:"):append(${tweet_id})}

 

 

REST CALL to Model

{"accessKey":"from your workbench","request":{"sentence":"${msg:replaceAll('"', ''):replaceAll('n','')}"}}

 

 

Resources

Tim Spann
Principal Developer Advocate
More by this author

1 Comments

by it technical support on

Your content is very impressive and thanks for sharing this article. its very useful.

Leave a comment

Your email address will not be published. Links are not permitted in comments.