Cloudera Engineering Blog · Cloudera Manager Posts
In a prior blog post, Omar explained two important concepts introduced in Cloudera Manager 4.5: Role Groups and Host Templates. In this post, I’ll demonstrate how to use role groups and host templates to easily expand an existing CDH cluster onto heterogeneous hardware. If you haven’t already looked at Omar’s post, I’d recommend doing so before reading this one, as I’ll assume you are familiar with role groups and host templates.
Although these instructions/screenshots are premised on Cloudera Manager 4.5, they are valid for subsequent releases as well.
Initial State and Goal
At Cloudera, we believe that Cloudera Manager is the best way to install, configure, manage, and monitor your Apache Hadoop stack. Of course, most users prefer not to take our word for it — they want to know how Cloudera Manager works under the covers, first.
In this post, I’ll explain some of its inner workings.
The Vocabulary of Cloudera Manager
We’re very pleased to bring you this guest post from Verisign engineer Benoit Perroud, which is based on his personal experiences with the new “Parcel” binary distribution format in Cloudera Manager 4.5.
Among all the new features released with Cloudera Manager 4.5, Parcel is probably one of the most unnoticed – despite the fact it has the potential to become the administrator’s best friend.
Yesterday we announced the availability of Cloudera Manager 4.6. As part of this release, the Free Edition of Cloudera Manager (now a part of Cloudera Standard) has been enhanced significantly to include many features formerly only available with a subscription license:
The news this morning focused on the launch of Cloudera Search, an exciting new capability for our platform that was much anticipated by our customers and engineers. Also released at the same time is a new release of Cloudera Manager (4.6).
Cloudera Manager 4.6 includes a number of enhancements as well as improvements in quality and usability. (A follow-on blog post will do a deep dive on the new features and functions.) Most notable in Cloudera Manager 4.6 is that the free version (included in Cloudera Standard) is greatly enhanced. Cloudera Standard now includes monitoring, health checks, events & alerts, log search, kerberos automation, and multi-cluster support.
Today is a big day: Cloudera is not only urging our customers to “Unaccept the Status Quo” (the continued and accelerating spending on data warehousing, expensive data storage, and associated software licenses), but we also announced that Cloudera Search has entered public beta. Now anyone who knows how to do a Google search can query data stored in Cloudera’s Platform for Big Data.
In this post, however, I’d like to explain the new, simpler product naming/packaging structure that will make adopting and deploying Cloudera more straightforward.
Introducing Cloudera Standard
Helping users manage hundreds of configurations for the growing family of Apache Hadoop services has always been one of Cloudera Manager’s main goals. Prior to version 4.5, it was possible to set configurations at the service (e.g. hdfs), role type (e.g. all datanodes), or individual role level (e.g. the datanode on machine17). An individual role would inherit the configurations set at the service and role-type levels. Configurations made at the role level would override those from the role-type level. While this approach offers flexibility when configuring clusters, it was tedious to configure subsets of roles in the same way.
In Cloudera Manager 4.5, this issue is addressed with the introduction of role groups. For each role type, you can create role groups and assign configurations to them. The members of those groups then inherit those configurations. For example, in a cluster with heterogeneous hardware, a datanode role group can be created for each host type and the datanodes running on those hosts can be assigned to their corresponding role group. That makes it possible to tweak the configurations for all the datanodes running on the same hardware by modifying the configurations of one role group.
Have you ever wished you could upgrade to the latest CDH minor release with just a few mouse clicks, and even without taking any downtime on your cluster? Well, with Cloudera Manager 4.5 and its new “Parcel” feature, you can!
That release introduced many new features and capabilities related to parcels, and in this FAQ-oriented post, you will learn about most of them.
What are parcels?
One of the complexities of Apache Hadoop is the need to deploy clusters of servers, potentially on a regular basis. At Cloudera, which at any time maintains hundreds of test and development clusters in different configurations, this process presents a lot of operational headaches if not done in an automated fashion. In this post, I’ll describe an approach to cluster automation that works for us, as well as many of our customers and partners.
At Cloudera engineering, we have a big support matrix: We work on many versions of CDH (multiple release trains, plus things like rolling upgrade testing), and CDH works across a wide variety of OS distros (RHEL 5 & 6, Ubuntu Precise & Lucid, Debian Squeeze, and SLES 11), and complex configuration combinations — highly available HDFS or simple HDFS, Kerberized or non-secure, using YARN or MR1 as the execution framework, etc. Clearly, we need an easy way to spin-up a new cluster that has the desired setup, which we can subsequently use for integration, testing, customer support, demos, and so on.
As Cloudera’s keeper of customer stories, it’s dawned on me that others might benefit from the information I’ve spent the past year collecting: the many use cases and deployment patterns for Hadoop amongst our customer base.
This week I’d like to highlight Nokia, a global company that we’re all familiar with as a large mobile phone provider, and whose Senior Director of Analytics – Amy O’Connor – will be speaking at tomorrow’s Cloudera Sessions event in Boston.