Since the launch of Cloudera Data Science Workbench (CDSW) in 2017 we’ve focused on accelerating enterprise data science from research to production. We’re helping hundreds of customers like IQVIA and Deutsche Telekom build their own AI factories by enabling large data science teams with secure, self-service access to the business data, compute resources, and open-source tools and libraries they need to innovate and impact business faster.
Improving data science team productivity with a joyful user experience remains a key focus in our mission to empower customers to industrialize machine learning and AI, and today we’re excited to announce the availability of CDSW version 1.6 with support for third-party editors including Jupyter Notebook, RStudio, PyCharm and more. CDSW enables team collaboration on the end-to-end data science workflow, from data exploration and data engineering to model development and deployment in production. This can involve collaboration among data engineers, data scientists and ML engineers who often have different editor and IDE preferences. Now diverse teams can tap into the benefits of self-service data science for the enterprise with CDSW, all while working within their most familiar or preferred IDE.
How it Works
Browser based IDEs
Not unlike sports teams, when it comes to IDE preferences sometimes everyone has a favorite, everyone’s favorite is different, and nobody is wrong. Indeed, across Cloudera customers we regularly meet fans of both Jupyter Notebooks and RStudio, for example. CDSW 1.6 now runs these and other web-based editors directly within CDSW through installation in an engine by an administrator or at the project-level by a CDSW user, just like any other library. Jupyter Notebook support comes pre-packaged within CDSW and RStudio can be quickly installed using the provided instructions.
To use Jupyter Notebooks in CDSW, for example, a user simply accesses their CDSW Project:
Starts a new interactive Session, choosing the Jupyter Notebook editor:
And can proceed to access and edit CDSW Project files using a Jupyter Notebook within the CDSW browser-based environment:
IDEs local to your machine
Other coders on the team including ML and DevOps engineers often work in local IDEs such as PyCharm. These applications run locally on the user’s computer and connect to CDSW remotely over SSH for code completion and execution. They must be configured per user and are not associated at the project level in CDSW. The documentation provides sample instructions for the Professional Edition of PyCharm v2019.1.
Together these approaches address the more popular editor preferences across Cloudera’s customers and were tested as part of CDSW 1.6 development. Other browser-based and local IDEs can be installed and configured. Please see the CDSW documentation to learn more about these and other security and administration features new in CDSW version 1.6.
Cloudera Data Science Workbench 1.6.x is supported on the following versions of CDH with Cloudera Manager, in addition to HDP: for RPM-based deployments, CDH 5.7 or higher and CDH 6.1.x or higher; Cloudera Manager 5.11 or higher and Cloudera Manager 6.1.x or higher. For CSD-based deployments, CDH 5.7 or higher and CDH 6.1.x or higher; Cloudera Manager 5.13 or higher and Cloudera Manager 6.1.x or higher. For RPM-based deployments on HDP: HDP 2.6.5 or higher and HDP 3.1.0 or higher.
See the new capabilities in action and learn more about how Cloudera Data Science Workbench accelerates enterprise data science from research to production in the CDSW resource center.
For existing Cloudera customers, CDSW version 1.6 is available for download and trial here.