Tag Archives: aws

Data Engineering with Cloudera Altus

Categories: Altus Cloud Hive Spark

With modern businesses dealing with an ever-increasing volume of data, and an expanding set of data sources, the data engineering process that enables analysis, visualization, and reporting only becomes more important.

When considering running data engineering workloads in the public cloud, there are capabilities which enable different operational models from on-premises deployments. The key factors here are the presence of a distinct storage layer within the cloud environment, and the ability to provision compute resources on-demand (e.g.: with Amazon’s S3 and EC2 respectively).

Read More

What’s New in Cloudera Director 2.4?

Categories: CDH Cloud

Cloudera Director 2.4 improves support for long-running clusters by syncing with upgrades and topology changes via Cloudera Manager, and adds support for Spark 2 and Kudu. Cloudera Director along with CM and CDH5.11 adds support for Microsoft Azure Data Lake Store (ADLS), and pausing of clusters with Amazon EBS volumes.

Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice.

Read More

Analyzing US flight data on Amazon S3 with sparklyr and Apache Spark 2.0

Categories: CDH Data Science Hadoop Spark Use Case

We posted several blog posts about sparklyr (introduction, automation), which enables you to analyze big data leveraging Apache Spark seamlessly with R. sparklyr, developed by RStudio, is an R interface to Spark that allows users to use Spark as the backend for dplyr, which is the popular data manipulation package for R.

If you are interested in sparklyr, you can learn how to use it with the official document,

Read More

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS

Categories: Cloud Impala Performance

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases.

Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databases. With more and more users looking to deploy and run in public-cloud environments,

Read More

How-to: Deploy a Secure Enterprise Data Hub on AWS (Part 2)

Categories: Cloud How-to Platform Security & Cybersecurity

Learn how to use Cloudera Director, Microsoft Active Directory, and Centrify Express to deploy a secure EDH cluster for workloads in the public cloud.

In Part 1 of this series, you learned about configuring Microsoft Active Directory and Centrify Express for optimal security across your Cloudera-powered EDH, whether for on-premise or public-cloud deployments. In this concluding installment, you’ll learn the cloud-specific pieces in this process, including some AWS fundamentals and in-depth details about cluster provisioning using Cloudera Director.

Read More