Tag Archives: ETL

Announcing Support for Spot Instances in Cloudera Altus

Categories: Cloud

A month ago, we publicly announced Cloudera Altus, our new platform–as–a–service offering, and today, we are expanding the Altus data engineering service to support AWS EC2 Spot instances. Cloud infrastructure is the most costly component of running Altus data engineering workloads in the cloud. ¬†Altus EC2 Spot instance support makes it easy to significantly reduce the cost of cloud infrastructure by allowing users to provision Altus data engineering clusters backed by excess EC2 compute capacity at reduced prices.

Read more

How To Set Up a Shared Amazon RDS as Your Hive Metastore

Categories: Cloud Hadoop Hive How-to Impala Spark Use Case

Before CDH 5.10, every CDH cluster had to have its own Apache Hive Metastore (HMS) backend database. This model is ideal for clusters where each cluster contains the data locally along with the metadata. In the cloud, however, many CDH clusters run directly on a shared object store (like Amazon S3), making it possible for the data to live across multiple clusters and beyond any cluster’s lifespan. In this scenario clusters need to regenerate and coordinate metadata for the underlying shared data individually.

Read more