Recently, GoDataDriven installed a Cloudera Enterprise (CDH + Cloudera Manager) cluster on Microsoft Azure. This two-part series (republished with permission) includes information about use case, design, and installation.
Processing large amounts of unstructured data requires serious computing power and also maintenance effort. As load on computing power typically fluctuates due to time and seasonal influences and/or processes running on certain times, a cloud solution like Microsoft Azure is a good option to be able to scale up easily and pay only for what is actually used.
New functionality includes support for spot instances, automatic job submission, and integrated setup for HA and Kerberized clusters.
Cloudera Director is the manifestation of Cloudera’s commitment to provide a simple and reliable way to deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Cloudera Director lets you deploy production-ready clusters for big data applications and successfully run workloads in the cloud. With Cloudera Director 2.0,
Spark Dataflow from Cloudera Labs is now part of Google’s New Dataflow SDK, which will be proposed to the Apache Incubator.
Spark Dataflow is an experimental implementation of Google’s Dataflow programming model that runs on Apache Spark. The initial implementation was written by Josh Wills, and entered Cloudera Labs exactly a year ago. Since then, we’ve seen a number of contributions to the project, culminating in the recent addition of an implementation of streaming (running on Spark Streaming) by Amit Sela from PayPal.
Cloudera Director 1.5 introduces a new plugin architecture to enable support for additional cloud providers. If you want to implement a plugin to add integration with a cloud provider that is not supported out-of-the-box, or to extend one of the existing plugins, these details will get you started.
As discussed in our previous blog post, the Cloudera Director Service Provider Interface (Cloudera Director SPI) defines a Java interface and packaging standards for Cloudera Director plugins.
Cloudera Director 1.5 is now available; this post describes what’s inside, including a new open source plugin interface.
Cloudera Director is the manifestation of Cloudera’s commitment to providing a simple and reliable way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. With Cloudera Director 1.5, we continue the story of enabling production-ready clusters and big data applications by focusing on the following themes.