Tag Archives: Altus

Introducing S3Guard: S3 Consistency for Apache Hadoop

Categories: Altus CDH Cloud Hadoop

Synopsis

This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works.

Problem

Although Apache Hadoop has support for using Amazon Simple Storage Service (S3) as a Hadoop filesystem, S3 behaves different than HDFS.  One of the key differences is in the level of consistency provided by the underlying filesystem.

Read more

Announcing Support for Spot Instances in Cloudera Altus

Categories: Cloud

A month ago, we publicly announced Cloudera Altus, our new platform–as–a–service offering, and today, we are expanding the Altus data engineering service to support AWS EC2 Spot instances. Cloud infrastructure is the most costly component of running Altus data engineering workloads in the cloud.  Altus EC2 Spot instance support makes it easy to significantly reduce the cost of cloud infrastructure by allowing users to provision Altus data engineering clusters backed by excess EC2 compute capacity at reduced prices.

Read more

Data Engineering with Cloudera Altus

Categories: Altus Cloud Hive Spark

With modern businesses dealing with an ever-increasing volume of data, and an expanding set of data sources, the data engineering process that enables analysis, visualization, and reporting only becomes more important.

When considering running data engineering workloads in the public cloud, there are capabilities which enable different operational models from on-premises deployments. The key factors here are the presence of a distinct storage layer within the cloud environment, and the ability to provision compute resources on-demand (e.g.: with Amazon’s S3 and EC2 respectively).

Read more