Category Archives: Data Science

Cloudera Enterprise 5.12 is Now Available

Categories: Altus CDH Cloud Cloudera Manager Cloudera Navigator Data Science Hue Impala Kafka Kudu

Cloudera is pleased to announce that Cloudera Enterprise 5.12 is now generally available (GA). The release includes enhancements for running in cloud environments (with broader ADLS support and improved AWS Spot Instance support), usability and productivity improvements for both data science and analytic workloads, as well as performance gains and self-service performance management across a range of workloads.

As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack.

Read more

Deep learning on Apache Spark and Apache Hadoop with Deeplearning4j

Categories: Data Science Hadoop Spark

In late 2016, Ben Lorica of O’Reilly Media declared that “2017 will be the year the data science and big data community engage with AI technologies.” Deep learning on GPUs has pervaded universities and research organizations prior to 2017, but distributed deep learning on CPUs is now beginning to gain widespread adoption in a diverse set of companies and domains. While GPUs provide top-of-the-line performance in numerical computing, CPUs are also becoming more efficient and much of today’s existing hardware already has CPU computing power available in bulk.

Read more

Create conda recipe to use C extended Python library on PySpark cluster with Cloudera Data Science Workbench

Categories: CDH Data Science How-to Spark

Cloudera Data Science Workbench provides data scientists with secure access to enterprise data with Python, R, and Scala. In the previous article, we introduced how to use your favorite Python libraries on an Apache Spark cluster with PySpark. In Python world, data scientists often want to use Python libraries, such as XGBoost, which includes C/C++ extension. This post shows how to solve this problem creating a conda recipe with C extension.

Read more

Getting Started with Cloudera Data Science Workbench

Categories: CDH Data Science

Last week, Cloudera announced the General Availability release of Cloudera Data Science Workbench. In this post, I’ll give a brief overview of its capabilities and architecture, along with a quick-start guide to connecting Cloudera Data Science Workbench to your existing CDH cluster in three simple steps.

At its core, Cloudera Data Science Workbench enables self-service data science for the enterprise. Data scientists can build, scale, and deploy data science and machine learning solutions in a fraction of the time,

Read more

The Benefits of Migrating HPC Workloads To Apache Spark

Categories: CDH Data Science Hadoop Spark


Recently we worked with a customer that needed to run a very significant amount of models in a given day to satisfy internal and government regulated risk requirements.  Several thousand model executions would need to be supported per hour.  Total execution time was very important to this client.  In the past the customer used thousands of servers to meet the demand.  They need to run many derivations of this model with different economic factors to satisfy their requirements.

Read more