Cloudera Data Science Workbench (CDSW) makes secure, collaborative data science at scale a reality for the enterprise and accelerates the delivery of new data products. With CDSW, organizations can research and experiment faster, deploy models easily and with confidence, as well as rely on the wider Cloudera platform to reduce the risks and costs of data science projects. Access any data anywhere – from cloud object storage to data warehouses, CDSW provides connectivity not only to CDH but the systems your data science teams rely on for analysis.
CDSW 1.4 now extends the platform experience from research to production. Two key new capabilities, Experiments and Models, let data scientists build, train, and deploy models in a unified workflow; security enhancements automate user administration.
Experiments. As data scientists iteratively develop models, they often experiment with datasets, features, libraries, and algorithms as well as tuning hyperparameters. Each change can significantly impact the resulting model but is not typically recorded, making it impossible to reproduce and explain a given result. This leads to wasted time and effort during research and collaboration or, worse, compliance risk.
With Experiments, data scientists can run a batch job that will:
- create a snapshot of model code, dependencies, and configuration parameters necessary to train the model
- build and execute the training run in an isolated container
- track model metrics, performance, and any model artifacts the user specifies
Users can now inspect and compare their prior training runs to determine which model is best, and then take the next steps, such as deploying the best model.
Models. Data scientists often develop models using a variety of Python/R open source packages. The hard part is exposing those models to different stakeholders. However, deploying models to production typically requires time-consuming and error-prone recoding, as well as complex DevOps knowledge. Furthermore, keeping track of or rolling back deployed models poses significant version control challenges for data scientists and compliance offers alike.
With Models, data scientists can simply select a Python or R function within a project file, and Cloudera Data Science Workbench will:
- create a snapshot of model code, saved model parameters, and dependencies
- build an immutable executable container with the trained model and serving code
- add a REST endpoint that automatically accepts input parameters matching the function signature, and that returns a data structure matching the function’s return type
- save the built model container, along with metadata like who built or deployed it
- deploy and start a specified number of model API replicas, automatically load balanced
- let the user document, test, and share the model
Simplified user administration. Previous CDSW releases offered LDAP and SAML authentication but allowed every user to log in. The consequence was user sprawl and unintended license consumption. Designating CDSW administrators was a manual affair in the tool itself.
With release 1.4 you can now designate the LDAP and SAML groups for both users and administrators. With automatic synchronization, the ability to log in or administer CDSW is now dependent on group membership; authorization is now centralized in the system you already use for that purpose.
Cloudera Data Science Workbench 1.4.x is supported on the following versions of CDH and Cloudera Manager: CDH 5.7 or higher 5.x versions. For CSD-based deployments: Cloudera Manager 5.13 or higher 5.x versions; for package-based deployments: Cloudera Manager 5.11 or higher 5.x versions. In addition to cloud options, customers can now deploy on premises with Oracle Linux 7.4 (for the Oracle Big Data Appliance). Full details are available from the online release notes.
Learn more about how Cloudera Data Science Workbench makes your data science team more productive.
For existing Cloudera customers, CDSW Release 1.4 is available for download and trial here.
You can see the new capabilities in action in the replay of our webinar, Machine Learning Models: From Research to Production.