Before CDH 5.10, every CDH cluster had to have its own Apache Hive Metastore (HMS) backend database. This model is ideal for clusters where each cluster contains the data locally along with the metadata. In the cloud, however, many CDH clusters run directly on a shared object store (like Amazon S3), making it possible for the data to live across multiple clusters and beyond any cluster’s lifespan. In this scenario clusters need to regenerate and coordinate metadata for the underlying shared data individually.
Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple, reliable, automated fashion.
Cloudera Director Overview
In this post, you will learn about new functionality in release 2.3, but first, if you’re new to Cloudera Director, let’s revisit what it does.
- On-demand creation and termination of clusters: Using Cloudera Director,
Cloudera is proud to announce that Cloudera Enterprise 5.10 is now generally available (GA). The highlights of this release include the GA of the new columnar storage engine Apache Kudu, improved cloud performance and cost-optimizations, and cloud-native data governance for Amazon S3.
As usual, there are also a number of quality enhancements and bug fixes (learn more about our multi-dimensional hardening/QA process) and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):
- GA of Apache Kudu –
In Part 1 of the blog, we covered all the prerequisites needed to deploy a CDH cluster on the Microsoft Azure cloud platform. In Part 2, we will cover the resources required on the Azure platform and actually deploy a cluster with Cloudera Director.
Cloudera Director Use Case
Cloudera Director simplifies cluster creation and lessen the time to an operational cluster on the cloud. It’s a great tool for running POCs in your organization.
Learn how to use Cloudera Director, Microsoft Active Directory (AD DS, AD CS, AD DNS), SAMBA, and SSSD to deploy a secure EDH cluster for workloads in the public cloud.
Authenticating users in Apache Hadoop is the first line of security we recommend. Like most, if not all RDBMS, a user is provided with a username and a password to validate their identity. This is a requirement to access any data managed by those systems.