Category Archives: CDH

Cloudera SDX: Under the Hood

Categories: CDH

What is SDX?

Shared Data Experience — SDX — is Cloudera’s secret ingredient that makes it possible to deploy Cloudera’s four core functions (Data Engineering, Data Science, Analytic DB, Operational DB) on a single platform.

Why does that matter?

First, each of those core functions is essential to any modern enterprise business.

  • Data Engineering enables the business to run batch or stream processes that speed ETL and train machine learning models
  • Data Science enables the business to do exploratory data science at big data scale with full data security and governance
  • Analytic DB delivers the fastest time-to-insight with the flexibility and agility to run in any environment and against any type of data.

Read more

How to Distribute your R code with sparklyr and Cloudera Data Science Workbench

Categories: CDH How-to Spark

sparklyr is a great opportunity for R users to leverage the distributed computation power of Apache Spark without a lot of additional learning. sparklyr acts as the backend of dplyr so that R users can write almost the same code for both local and distributed calculation over Spark SQL.

 

Since sparklyr v0.6, we can run R code across our Spark cluster with spark_apply().

Read more

How To Predict ICU Mortality with Digital Health Data, DL4J, Apache Spark and Cloudera

Categories: CDH Data Science Spark

Modeling EHR Data in Healthcare

In this case study, we take a look at modeling electronic health record (EHR) data with deep learning and Deeplearning4j (DL4J). We draw inspiration from recent research showing that carefully designed neural network architectures can learn effectively from the complex, messy data collected in EHRs. Specifically, we describe how to train an  long short-term memory recurrent neural network (LSTM RNN) to predict in-hospital mortality among patients hospitalized in the intensive care unit (ICU).

Read more

Customizing Docker Images in Cloudera Data Science Workbench

Categories: Altus CDH Cloud Data Science How-to Tools

This article shows how to build and publish a customized Docker image for usage as an engine in Cloudera Data Science Workbench. Such an image or engine customization gives you the benefit of being able to work with your favorite tool chain inside the web based application.

Motivation:

Cloudera Data Science Workbench (CDSW) enables data scientists to use their favorite tools such as R, Python, or Scala based libraries out of the box in an isolated secure sandbox environment.

Read more

Deep Learning with Intel’s BigDL and Apache Spark

Categories: CDH Data Science Hadoop Spark

Cloudera recently published a blog post on how to use Deeplearning4J (DL4J) along with Apache Hadoop and Apache Spark to get state-of-the-art results on an image recognition task. Continuing on a similar stream of work, in this post we discuss a viable alternative that is specifically designed to be used with Spark, and data available in Spark and Hadoop clusters via a Scala or Python API.

The Deep Learning landscape is still evolving.

Read more