What’s New in Cloudera Director 2.2?

Categories: CDH Cloud Cloudera Manager Hadoop

This new release adds support for Amazon EBS volumes and the ability to diagnose cluster bootstrap errors quickly.

Cloudera Director provides a simple, reliable, enterprise-grade way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. Cloudera Director enables you to deploy production-ready clusters for big data applications and successfully run workloads in the cloud.

Cloudera Director makes it easier for customers to:

  • Deploy clusters in line with patterns native to cloud infrastructure
  • Use an interface to define in one place the desired cluster specification all the way down to the operating system
  • Repeatedly and programmatically instantiate these cluster definitions
  • Adapt to the dynamic nature of cloud infrastructure

Cloudera Director 2.2 provides additional mechanisms to get that initial cluster definition right and the ability to diagnose errors and iterate quickly.

Cloudera Director supports multiple cloud providers: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform. Cloudera Director 2.2 extends this capability by adding support for two popular storage options: Amazon EBS and Azure Standard Storage.

In this post, you will learn about the new functionality in Cloudera Director 2.2 that supports these themes. Before we get to those details, let’s revisit what Cloudera Director can do.

  • On-demand creation and termination of clusters: Using Cloudera Director, you can configure and set up Cloudera Manager instances and highly available CDH clusters in the cloud provider environment of your choice. A single Cloudera Director instance can manage multiple cloud provider environments, and the lifecycles of multiple Cloudera Managers and clusters. Cloudera Director also lets you configure Cloudera Manager and cluster services like Hive to use external databases hosted on separate database servers or provisioned through AWS Relational Database Service (RDS).
  • On-demand Grow and shrink clusters: One of the main benefits of cloud infrastructure is the ability to dynamically provision instances on demand. Cloudera Director leverages this benefit, and does the additional work required to make these instances functioning workers of your Hadoop clusters managed by Cloudera Manager.
  • Programmatic and repeated instantiation of clusters: Cloudera Director can consume cluster definition via a HOCON-based configuration file submitted using the Cloudera Director CLI, or as JSON input using the Java and Python clients. The flexibility of these input formats allows for customized setups to get the most out of your Hadoop clusters. These cluster definitions can include custom scripts to be run after instance provisioning, after cluster setup, and before cluster termination. This functionality can be used to do things like install custom packages after instances are provisioned, run pre-written configuration scripts using Cloudera Manager’s API on clusters set up by Cloudera Director, and move data off instances to object stores before terminating clusters.
  • Usage-based billing for Cloudera Services: Usage-based billing is particularly relevant for optimizing your spend on transient clusters. To enable usage-based billing, you need to get a billing ID from Cloudera. Thereafter, you can use Cloudera Enterprise as usual, except you will get charged for using CDH services only while they are running.
  • Security: Cloudera Director, like other Cloudera offerings, is committed to enabling secure deployments and applications. This includes encryption of Cloudera Director’s database, setting up Cloudera Manager and clusters with Kerberos authentication, and deploying Cloudera Navigator for audit, lineage, and data discovery.
  • Powerful Web User Interface: Cloudera Director’s user interface provides a single dashboard to assess the state of all your clusters, across all environments. The web user interface provides insights into the health of clusters across different Cloudera Manager deployments. It can also be used to bootstrap test clusters and explore Cloudera Director’s capabilities as a stepping stone to using the configuration file or API to deploy production-ready clusters.
Image: Quickly identify clusters with concerning hosts across all environments

Quickly identify clusters with concerning hosts across all environments

Image : Easily drill down to both Hadoop and cloud specific details of each instance

Easily drill down to both Hadoop and cloud specific details of each instance

New Features in Cloudera Director 2.2

New features in Cloudera Director 2.2 include:

  • Support for AWS EBS volumes: AWS offers instance types that only support EBS volumes for storage, including the c4 and m4 series. By supporting EBS volumes, users of Cloudera Director now have more instance types to choose from for the worker nodes in their clusters. More details on use of these EBS volumes can be found in Cloudera’s AWS Reference Architecture and in the Cloudera Director user guide.
  • Support for Azure Standard storage: Along with Azure Premium Storage, Cloudera Director now supports Azure Standard Storage for worker nodes. Virtual hard disks (VHDs) on Microsoft Azure Standard Storage do not provide the same throughput and performance guarantees as Premium Storage disks, but are a lower cost option for workloads that are price sensitive.
  • Validation of configuration file cluster definition: Configuration files are a common way to deploy production clusters using Cloudera Director. They allow specification of cloud provider details, Cloudera Manager and CDH configurations all in one file in the HOCON file format, which allows for version control and repeated deployments. Given how much information can be packed into one of these files, it can be hard to visually parse a configuration file over time and understand the corresponding deployment. To help with this, Cloudera Director’s CLI’s validate command now has a verbose option that generates an HTML file that displays the configurations in a more visually understandable way. An example snippet of such an HTML file is shown below.
Verbose Output of Configuration File Validation

Verbose output of configuration file validation

This CLI command invocation looks like this:

cloudera-director validate myconfig.conf –lp.validate.verbose=true

Starting with Director 2.2, this command also validates the service types (e.g. HDFS, HBASE) and role types (e.g. NAMENODE, MASTER) and warns if the provided strings are not recognized, or if the service – role relationships are not recognized.

  • Collection of diagnostic logs: Spinning up clusters that are based on well-understood cluster specifications can occasionally fail because of network connectivity issues or seemingly innocuous changes such as using a new AMI. Such problems are hard to diagnose without the relevant Cloudera Manager and cluster logs. These logs are even more crucial when experimenting with different cluster topologies and configurations. To help with this, Cloudera Director 2.2 automatically collects diagnostic logs upon cluster bootstrap and update failures. This collection can also be invoked on demand at any point in time and is offered as an action prior to termination of the cluster or Cloudera Manager. The logs are downloaded to the Cloudera Director server instance, and additionally sent to Cloudera if you are a Cloudera Enterprise customer. You can see the status of the latest diagnostic log collection results in the Deployment details and Cluster details sections of the web user interface.

Image: Diagnostic Log Summary

Using Cloudera Director

If you’re itching to try out these features, you can do so in the following ways: