Tag Archives: Cloudera Data Science Workbench

Create conda recipe to use C extended Python library on PySpark cluster with Cloudera Data Science Workbench

Categories: CDH Data Science How-to Spark

Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ extension. This post shows how to solve this problem creating a conda recipe with C extension.

Read More

Getting Started with Cloudera Data Science Workbench

Categories: CDH Data Science

Last week, Cloudera announced the General Availability release of Cloudera Data Science Workbench. In this post, I’ll give a brief overview of its capabilities and architecture, along with a quick-start guide to connecting Cloudera Data Science Workbench to your existing CDH cluster in three simple steps.

At its core, Cloudera Data Science Workbench enables self-service data science for the enterprise. Data scientists can build, scale, and deploy data science and machine learning solutions in a fraction of the time,

Read More

The Benefits of Migrating HPC Workloads To Apache Spark

Categories: CDH Data Science Hadoop Spark

Overview

Recently we worked with a customer that needed to run a very significant amount of models in a given day to satisfy internal and government regulated risk requirements.  Several thousand model executions would need to be supported per hour.  Total execution time was very important to this client.  In the past the customer used thousands of servers to meet the demand.  They need to run many derivations of this model with different economic factors to satisfy their requirements.

Read More

BigDL on CDH and Cloudera Data Science Workbench

Categories: CDH How-to Spark

Introduction

As companies strive to implement modern solutions based on deep learning frameworks, there is a need to deploy it on existing hardware infrastructure in a scalable and distributed manner comes to the fore. Recognizing this need, Cloudera’s and Intel’s Big Data Technologies engineering teams jointly detail Intel’s BigDL Apache Spark deep learning library on the latest release of Cloudera’s Data Science Workbench. This collaborative effort allows customers to build new deep learning applications with BigDL Spark Library by leveraging their existing homogeneous compute capacity of Xeon servers running Cloudera’s Enterprise without having to invest in expensive GPU farms and bringing up parallel frameworks such as TensorFlow or Caffe.

Read More

Use your favorite Python library on PySpark cluster with Cloudera Data Science Workbench

Categories: CDH Data Science How-to Spark

Cloudera Data Science Workbench provides freedom for data scientists. It gives them the flexibility to work with their favorite libraries using isolated environments with a container for each project.

In JVM world such as Java or Scala, using your favorite packages on a Spark cluster is easy. Each application manages preferred packages using fat JARs, and it brings independent environments with the Spark cluster. Many data scientists prefer Python to Scala for data science,

Read More