What’s New in Cloudera Director 2.5?

Categories: CDH Cloud Cloudera Manager

Cloudera Director 2.5 brings cluster auto-repair functionality and improved support for AWS Spot instances. Support for Cloudera Manager’s external account feature has been added along with S3Guard support.

Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple, reliable, automated fashion.

Cloudera Director Overview

In this post, you will learn about new functionality in release 2.5, but first, if you’re new to Cloudera Director, let’s visit what it does.

  • On-demand creation and termination of clusters: Using Cloudera Director, you can allocate and configure Cloudera Manager instances and highly available CDH clusters in the cloud provider of your choice. A single Cloudera Director instance can manage multiple cloud provider environments and the separate lifecycles of multiple Cloudera Managers and clusters. Cloudera Director lets you configure Cloudera Manager and cluster services like Hive to use databases hosted on external database servers that you maintain yourself or that Cloudera Director provisions for you through AWS Relational Database Service (RDS).
  • Multi-cloud support: Cloudera Director supports creating clusters in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) through its cloud provider plugin architecture. A single Cloudera Director instance can work with multiple cloud providers at once. Because the plugin specification is open source, you can create a plugin to support other providers, either in-house or public.
  • On-demand grow and shrink of clusters: One of the main benefits of running Hadoop clusters in the cloud is being able to provision additional instances when demand increases, and to terminate instances when demand decreases. Cloudera Director, in concert with Cloudera Manager, does the work required to add new instances to and remove existing ones from your Hadoop clusters.
  • Programmatic and repeatable instantiation of clusters: Cloudera Director can consume cluster definitions specified in HOCON configuration files submitted through the Cloudera Director CLI or in JSON input sent to the Cloudera Director API. The flexibility and rich feature set of these input formats let you tailor Hadoop clusters to your needs. A cluster definition can include custom scripts to run after instance provisioning and cluster setup, or before cluster termination, to perform tasks like installing additional packages, configuring system settings, or saving off important data. Java and Python clients make it easy to work with the Cloudera Director API.
  • Long running cluster support: Long running clusters often require actions like upgrading CDH and Cloudera Manager, changing the topology of the cluster, and reconfiguring the cluster. Cloudera Director supports such modifications when using Cloudera Manager 5.11 and above.
  • Usage-based billing for Cloudera services: Usage-based billing can help you optimize your expenditures for transient clusters. With a pay-as-you-go billing ID from Cloudera, you can use your Cloudera Enterprise license as usual, but you are only charged for CDH services when they are running.
  • Security: Cloudera Director, like other Cloudera offerings, is committed to enabling secure deployments and applications. Cloudera Director’s own database is automatically encrypted, and Cloudera Director helps you configure Cloudera Manager and CDH clusters with Kerberos authentication, as well as deploy Cloudera Navigator for auditing, data lineage, and data discovery.
  • Powerful web user interface: Cloudera Director’s user interface provides a single dashboard to assess the health of all your clusters across all cloud providers and all Cloudera Manager deployments. It can also be used to bootstrap new clusters, grow and shrink existing clusters, and terminate clusters that are no longer needed. Exploring the web user interface is a great stepping stone to using the configuration file or API to deploy production-ready clusters.

New Features and Improvements in Cloudera Director 2.5

Cloudera Director 2.5’s main focus is improving its resilience to unexpected events in an instance’s lifecycle. This has been achieved by introducing the auto-repair functionality for worker nodes, and making bootstrap and grow operations much more resilient to instance disappearances.

AWS Spot instances and GCP preemptible instances provide a great cost reduction, but they can be terminated by the cloud provider with very little notice. With this release, Cloudera Director handles instance loss during all phases of the cluster lifecycle. During creation or modification, if an instance is lost, Cloudera Director will attempt to continue and will succeed if the cluster’s minimum instance counts are met. This is particularly useful in supporting AWS Spot instances and GCP’s preemptible instances, but is also relevant to on-demand instances.

Cloudera Director’s new auto-repair functionality will attempt to restore any instances that were requested but have gone missing during any part of the cluster lifecycle. For example:

  • Instances that were requested during the initial cluster bootstrap but were not successfully provisioned.
  • Instances that were requested as part of a grow modification to a cluster but were not successfully provisioned.
  • Instances which were terminated due to Spot price increases.

In any of these scenarios, Cloudera Director will periodically attempt to re-provision the missing instances until the cluster is back to its desired size. Cloudera Director will automatically repair all worker nodes. We recommend placing master nodes on on-demand instances to minimize the probability of their disappearance. If they do disappear, you can use Cloudera Director’s manual repair functionality to replace them.

During initial cluster creation, users can now specify external cloud accounts for use by various cluster services. This allows services like Impala to query data in Amazon S3. Furthermore, clusters are configurable with S3Guard. Due to AWS’s eventual consistency model, data written to Amazon S3 may not be immediately available once being written, which can cause problems when using multi-step ETL processes. S3Guard uses additional metadata to ensure that Amazon S3 data is consistently available across cluster services. To configure external accounts and S3Guard, refer to this example config file.

Finally, Cloudera Director is now able to synchronize cloud provider details like instance size and image type. With this, Cloudera Director is able to properly modify clusters using their intended instance details, not just the details that were established during cluster creation.

Using Cloudera Director

If you’re ready to give the latest version of Cloudera Director a try, here are the ways you can get started.

Send questions or feedback to the Cloudera Director community forum.

Michael Wilson is a Software Engineer, Cloudera Director