Now Available: Cloudera Data Science Workbench Release 1.4

by Wim Stoop

Posted in Business | Technical | May 22, 2018 3 min read

Cloudera Data Science Workbench (CDSW) makes secure, collaborative data science at scale a reality for the enterprise and accelerates the delivery of new data products. With CDSW, organizations can research and experiment faster, deploy models easily and with confidence, as well as rely on the wider Cloudera platform to reduce the risks and costs of data science projects. Access any data anywhere – from cloud object storage to data warehouses, CDSW provides connectivity not only to CDH but the systems your data science teams rely on for analysis.

CDSW 1.4 now extends the platform experience from research to production. Two key new capabilities, Experiments and Models, let data scientists build, train, and deploy models in a unified workflow; security enhancements automate user administration.

Experiments. As data scientists iteratively develop models, they often experiment with datasets, features, libraries, and algorithms as well as tuning hyperparameters. Each change can significantly impact the resulting model but is not typically recorded, making it impossible to reproduce and explain a given result. This leads to wasted time and effort during research and collaboration or, worse, compliance risk.

With Experiments, data scientists can run a batch job that will:

create a snapshot of model code, dependencies, and configuration parameters necessary to train the model
build and execute the training run in an isolated container
track model metrics, performance, and any model artifacts the user specifies

Users can now inspect and compare their prior training runs to determine which model is best, and then take the next steps, such as deploying the best model.

Models. Data scientists often develop models using a variety of Python/R open source packages. The hard part is exposing those models to different stakeholders. However, deploying models to production typically requires time-consuming and error-prone recoding, as well as complex DevOps knowledge. Furthermore, keeping track of or rolling back deployed models poses significant version control challenges for data scientists and compliance offers alike.

With Models, data scientists can simply select a Python or R function within a project file, and Cloudera Data Science Workbench will:

create a snapshot of model code, saved model parameters, and dependencies
build an immutable executable container with the trained model and serving code
add a REST endpoint that automatically accepts input parameters matching the function signature, and that returns a data structure matching the function’s return type
save the built model container, along with metadata like who built or deployed it
deploy and start a specified number of model API replicas, automatically load balanced
let the user document, test, and share the model

Simplified user administration. Previous CDSW releases offered LDAP and SAML authentication but allowed every user to log in. The consequence was user sprawl and unintended license consumption. Designating CDSW administrators was a manual affair in the tool itself.

With release 1.4 you can now designate the LDAP and SAML groups for both users and administrators. With automatic synchronization, the ability to log in or administer CDSW is now dependent on group membership; authorization is now centralized in the system you already use for that purpose.

Cloudera Data Science Workbench 1.4.x is supported on the following versions of CDH and Cloudera Manager: CDH 5.7 or higher 5.x versions. For CSD-based deployments: Cloudera Manager 5.13 or higher 5.x versions; for package-based deployments: Cloudera Manager 5.11 or higher 5.x versions. In addition to cloud options, customers can now deploy on premises with Oracle Linux 7.4 (for the Oracle Big Data Appliance). Full details are available from the online release notes.

Learn more about how Cloudera Data Science Workbench makes your data science team more productive.

For existing Cloudera customers, CDSW Release 1.4 is available for download and trial here.

You can see the new capabilities in action in the replay of our webinar, Machine Learning Models: From Research to Production.

Wim Stoop

Director Product Marketing @theWimster

More by this author

Editor's Choice

Business

Generative AI for the Enterprise

Technical

Building Trust in Public Sector AI Starts with Trusting Your Data

Now Available: Cloudera Data Science Workbench Release 1.4

Editor's Choice

Leave a comment Cancel reply