In Part 1 of the blog, we covered all the prerequisites needed to deploy a CDH cluster on the Microsoft Azure cloud platform. In Part 2, we will cover the resources required on the Azure platform and actually deploy a cluster with Cloudera Director.
Cloudera Director Use Case
Cloudera Director simplifies cluster creation and lessen the time to an operational cluster on the cloud. It’s a great tool for running POCs in your organization.
Learn how to use Cloudera Director, Microsoft Active Directory (AD DS, AD CS, AD DNS), SAMBA, and SSSD to deploy a secure EDH cluster for workloads in the public cloud.
Authenticating users in Apache Hadoop is the first line of security we recommend. Like most, if not all RDBMS, a user is provided with a username and a password to validate their identity. This is a requirement to access any data managed by those systems.
This new release adds support for Amazon EBS volumes and the ability to diagnose cluster bootstrap errors quickly.
Cloudera Director provides a simple, reliable, enterprise-grade way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. Cloudera Director enables you to deploy production-ready clusters for big data applications and successfully run workloads in the cloud.
Cloudera Director makes it easier for customers to:
- Deploy clusters in line with patterns native to cloud infrastructure
- Use an interface to define in one place the desired cluster specification all the way down to the operating system
- Repeatedly and programmatically instantiate these cluster definitions
- Adapt to the dynamic nature of cloud infrastructure
Cloudera Director 2.2 provides additional mechanisms to get that initial cluster definition right and the ability to diagnose errors and iterate quickly.
As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases.
Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databases. With more and more users looking to deploy and run in public-cloud environments,
The benchmark testing results detailed below can help you make an informed decision about AWS storage options for Impala.
In a recent post, you learned how Impala 2.6 on S3 delivers cloud-native features unmatched by other analytic databases in the cloud. With support to read/write data from Amazon S3, Impala provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility not found on other cloud-based analytic databases,