Category Archives: CDH

Analyzing US flight data on Amazon S3 with sparklyr and Apache Spark 2.0

Categories: CDH Data Science Hadoop Spark Use Case

We posted several blog posts about sparklyr (introduction, automation), which enables you to analyze big data leveraging Apache Spark seamlessly with R. sparklyr, developed by RStudio, is an R interface to Spark that allows users to use Spark as the backend for dplyr, which is the popular data manipulation package for R.

If you are interested in sparklyr, you can learn how to use it with the official document,

Read More

Up and running with Apache Spark on Apache Kudu

Categories: CDH Data Ingestion Data Science General Hadoop How-to Impala Kudu Spark Training Use Case

After the GA of Apache Kudu in Cloudera CDH 5.10, we take a look at the Apache Spark on Kudu integration, share code snippets, and explain how to get up and running quickly, as Kudu is already a first-class citizen in Spark’s ecosystem.

 

As the Apache Kudu development team celebrates the initial 1.0 release launched on September 19, and the most recent 1.2.0 version now GA as part of Cloudera’s CDH 5.10 release,

Read More

What’s New in Cloudera Director 2.3?

Categories: CDH Cloud Cloudera Manager Hadoop

Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple, reliable, automated fashion.

Cloudera Director Overview

In this post, you will learn about new functionality in release 2.3, but first, if you’re new to Cloudera Director, let’s revisit what it does.

  • On-demand creation and termination of clusters: Using Cloudera Director,

Read More

Cloudera Enterprise 5.10 is Now Available

Categories: CDH Cloud Cloudera Manager Cloudera Navigator Hadoop Hue Kudu

Cloudera is proud to announce that Cloudera Enterprise 5.10 is now generally available (GA). The highlights of this release include the GA of the new columnar storage engine Apache Kudu, improved cloud performance and cost-optimizations, and cloud-native data governance for Amazon S3.

As usual, there are also a number of quality enhancements and bug fixes (learn more about our multi-dimensional hardening/QA process) and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):

  • GA of Apache Kudu

Read More

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

Categories: CDH Cloud How-to Ops and DevOps Platform Security & Cybersecurity

In Part 1 of the blog, we covered all the prerequisites  needed to deploy a CDH cluster on the Microsoft Azure cloud platform. In Part 2, we will cover the resources required on the Azure platform and actually deploy a cluster with Cloudera Director.

Cloudera Director Use Case

Cloudera Director simplifies cluster creation and lessen the time to an operational cluster on the cloud. It’s a great tool for running POCs in your organization.

Read More