Learn how to use Cloudera to spin up Apache Hadoop clusters across multiple cloud providers to take advantage of competing prices and avoid infrastructure lock-in.
Why is a multi-cloud strategy important?
In the early days of Cloudera, it was a fair assumption that our software would be running on industry-standard servers that were purchased, owned, and operated by the client in their own data center. In the last few years,
With modern businesses dealing with an ever-increasing volume of data, and an expanding set of data sources, the data engineering process that enables analysis, visualization, and reporting only becomes more important.
When considering running data engineering workloads in the public cloud, there are capabilities which enable different operational models from on-premises deployments. The key factors here are the presence of a distinct storage layer within the cloud environment, and the ability to provision compute resources on-demand (e.g.: with Amazon’s S3 and EC2 respectively).
Cloudera Director 2.4 improves support for long-running clusters by syncing with upgrades and topology changes via Cloudera Manager, and adds support for Spark 2 and Kudu. Cloudera Director along with CM and CDH5.11 adds support for Microsoft Azure Data Lake Store (ADLS), and pausing of clusters with Amazon EBS volumes.
Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice.
Cloudera Enterprise 5.11 is Now Available
Cloudera is pleased to announce that Cloudera Enterprise 5.11 is now generally available (GA). The highlights of this release include lineage support for Apache Spark, Apache Kudu security integration, embedded data discovery for self-service BI, and new cloud capabilities for Microsoft ADLS and Amazon S3.
As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):
- Core Platform and Cloud
- Amazon S3 Consistency: S3Guard ensures that operations on Amazon S3 are immediately visible to other clients,
Before CDH 5.10, every CDH cluster had to have its own Apache Hive Metastore (HMS) backend database. This model is ideal for clusters where each cluster contains the data locally along with the metadata. In the cloud, however, many CDH clusters run directly on a shared object store (like Amazon S3), making it possible for the data to live across multiple clusters and beyond any cluster’s lifespan. In this scenario clusters need to regenerate and coordinate metadata for the underlying shared data individually.