Cloudera Altus (launched in May 2017) is a platform-as-a-service (PaaS) offering that enables users to analyze and process data at scale in public cloud infrastructures. Altus was designed from the outset to support multiple clouds from the perspective of both back-end architecture and front-end workflows. With the announcement of Microsoft Azure support, Altus will be able to support data engineering workloads in Microsoft Azure, with the same Altus interfaces for API and CLI,
What is SDX?
Shared Data Experience — SDX — is Cloudera’s secret ingredient that makes it possible to deploy Cloudera’s four core functions (Data Engineering, Data Science, Analytic DB, Operational DB) on a single platform.
Why does that matter?
First, each of those core functions is essential to any modern enterprise business.
- Data Engineering enables the business to run batch or stream processes that speed ETL and train machine learning models
- Data Science enables the business to do exploratory data science at big data scale with full data security and governance
- Analytic DB delivers the fastest time-to-insight with the flexibility and agility to run in any environment and against any type of data.
Since the birth of big data, Cloudera University has been teaching developers, administrators, analysts, and data scientists how to use big data technologies. We have taught over 50,000 folks all of the details of using technologies from Apache such as HDFS, MapReduce, Hive, Impala, Sqoop, Flume, Kafka, Core Spark, Spark SQL, Spark Streaming, and Spark MLlib.
sparklyr is a great opportunity for R users to leverage the distributed computation power of Apache Spark without a lot of additional learning. sparklyr acts as the backend of dplyr so that R users can write almost the same code for both local and distributed calculation over Spark SQL.
Since sparklyr v0.6, we can run R code across our Spark cluster with spark_apply().
Modeling EHR Data in Healthcare
In this case study, we take a look at modeling electronic health record (EHR) data with deep learning and Deeplearning4j (DL4J). We draw inspiration from recent research showing that carefully designed neural network architectures can learn effectively from the complex, messy data collected in EHRs. Specifically, we describe how to train an long short-term memory recurrent neural network (LSTM RNN) to predict in-hospital mortality among patients hospitalized in the intensive care unit (ICU).