Cloudera Engineering Blog · Cloudera Manager Posts

How-to: Automate Your Hadoop Cluster from Java

One of the complexities of Apache Hadoop is the need to deploy clusters of servers, potentially on a regular basis. At Cloudera, which at any time maintains hundreds of test and development clusters in different configurations, this process presents a lot of operational headaches if not done in an automated fashion. In this post, I’ll describe an approach to cluster automation that works for us, as well as many of our customers and partners.

Taming Complexity

At Cloudera engineering, we have a big support matrix: We work on many versions of CDH (multiple release trains, plus things like rolling upgrade testing), and CDH works across a wide variety of OS distros (RHEL 5 & 6, Ubuntu Precise & Lucid, Debian Squeeze, and SLES 11), and complex configuration combinations — highly available HDFS or simple HDFS, Kerberized or non-secure, using YARN or MR1 as the execution framework, etc. Clearly, we need an easy way to spin-up a new cluster that has the desired setup, which we can subsequently use for integration, testing, customer support, demos, and so on.

Customer Spotlight: Nokia’s Big Data Ecosystem Connects Cloudera, Teradata, Oracle, and Others

As Cloudera’s keeper of customer stories, it’s dawned on me that others might benefit from the information I’ve spent the past year collecting: the many use cases and deployment patterns for Hadoop amongst our customer base.

This week I’d like to highlight Nokia, a global company that we’re all familiar with as a large mobile phone provider, and whose Senior Director of Analytics – Amy O’Connor – will be speaking at tomorrow’s Cloudera Sessions event in Boston.

Cloudera Academic Partnership Program: Creating Hadoop Lovers in Universities Worldwide

Today Cloudera announced a new Cloudera Academic Partnership program, in which participating universities worldwide get access to curriculum, training, certification, and software. 

As noted in the press release, the global demand for people with Apache Hadoop and data science skills is dwarfing all supply. We consider it an important mission to help accredited universities meet that demand, by equipping them with the content and training they need to educate students in the Hadoop arts.

How-to: Use Vagrant to Set Up a Virtual Hadoop Cluster (For CDH 4)

This guest post comes to us from David Greco, CTO of Eligotech. For a how-to on this subject for CDH 5, see this post.

Vagrant is a very nice tool for programmatically managing many virtual machines (VMs) on a single physical machine. It natively supports VirtualBox and also provides plugins for VMware Fusion and Amazon EC2, supporting the management of VMs in those environments as well.

How-to: Create a CDH Cluster on Amazon EC2 via Cloudera Manager

Editor’s Note (added Feb. 28, 2014): The instructions below are deprecated for Cloudera Manager releases beyond 4.5. Please refer to this doc for instructions pertaining to releases 4.6 and later.

Cloudera Manager includes a new express installation wizard for Amazon Web Services (AWS) EC2. Its goal is to enable Cloudera Manager users to provision CDH clusters and Cloudera Impala (the open source distributed query engine for Apache Hadoop) on EC2 as easily as possible (for testing and development purposes only, not supported for production workloads) - and thus is currently the fastest way to provision a Cloudera Manager-managed cluster in EC2.

How-to: Set Up Cloudera Manager 4.5 for Apache Hive

Last week Cloudera released the 4.5 release of Cloudera Manager, the leading framework for end-to-end management of Apache Hadoop clusters. (Download Cloudera Manager here, and see install instructions here.) Among many other features, Cloudera Manager 4.5 adds support for Apache Hive. In this post, I’ll explain how to set up a Hive server for use with Cloudera Manager 4.5 (and later).

For details about other new features in this release, please see the full release notes:

What’s New in Cloudera Manager 4.5?

It has been a while since I have blogged, primarily because we have been heads-down working toward the Cloudera Manager 4.5 release that we announced yesterday!

Cloudera Manager has seen a rapid adoption among enterprise customers and as more clusters are deployed into production environments, the more feature requests we get from them. We have heard our customers and the Cloudera Manager 4.5 release aims to address many of these requests. Kudos to the engineering team for another feature-packed release.

New Products and Releases: Cloudera Navigator, Cloudera Enterprise BDR, and More

Today is an exciting day for Cloudera customers and users. With an update to our 100% open source platform and a number of new add-on products, every software component we ship is getting either a minor or major update. There’s a lot to cover and this blog post is only a summary. In the coming weeks we’ll do follow-on blog posts that go deeper into each of these releases.

New Products

How-to: Deploy a CDH Cluster in Skytap Cloud

You may have seen the recent announcement from Skytap about the availability of pre-configured CDH4 templates in the Skytap Cloud public template library. So for anyone who wants to try out a Cloudera Hadoop cluster—from small to large—it can now be easily accomplished in Skytap Cloud. The how-to below from Skytap’s Matt Sousely explains how.

The goal of this how-to will be to spin up a 10-node Cloudera Hadoop cluster in Skytap Cloud. To begin, let’s talk about the two new Cloudera Hadoop cluster templates. The first is Cloudera CDH4 Hadoop cluster: a 2-node Hadoop cluster template. It includes 2 nodes and a management node/server. The second is the Cloudera CDH4 Hadoop host template. This second template is not intended to run by itself in a configuration—rather, it contains a host VM that is ready to become another Hadoop node in the Cloudera CDH4 Hadoop cluster template-based configuration.

Cloudera Impala Beta (version 0.4) and Cloudera Manager 4.1.3 Now Available

I am pleased to announce the release of Cloudera Impala Beta (version 0.4) and Cloudera Manager 4.1.3. Key enhancements in each release are:

Cloudera Impala Beta (version 0.4)

Newer Posts Older Posts