Cloudera Engineering Blog · Cloudera Manager Posts
I am pleased to announce the release of Cloudera Impala Beta (version 0.2) and Cloudera Manager 4.1.1. These are both enhancement releases to make bug fixes available quickly. Key enhancements in each release are:
Cloudera Impala Beta (version 0.2)
I am very pleased to announce the availability of Cloudera Manager 4.1. This release adds support for the Cloudera Impala beta release, and management and monitoring of key CDH features.
Here are the highlights of Cloudera Manager 4.1:
Axemblr, purveyors of a cloud-agnostic MapReduce Web Service, have recently announced the availability of an Apache-licensed Java Client for the Cloudera Manager API.
The task at hand, according to Axemblr, is to ”deploy Hadoop on Cloud with as little user interaction as possible. We have the code to provision the hosts but we still need to install and configure Hadoop on all nodes and make it so the user has a nice experience doing it.” And voila, the answer is Cloudera Manager, with the process made easy via the REST API introduced in Release 4.0.
Note (added July 8, 2013): The information below is deprecated; we suggest that you refer to this post for current instructions.
Today we bring you one user’s experience using Apache Whirr to spin up a CDH cluster in the cloud. This post was originally published here by George London (@rogueleaderr) based on his personal experiences; he has graciously allowed us to bring it to you here as well in a condensed form. (Note: the configuration described here is intended for learning/testing purposes only.)
Our video animation factory has been busy lately. The embedded player below contains our two latest ones stitched together:
What do you do at Cloudera, and in which Apache project are you involved?
Strata Conference + Hadoop World (Oct. 23-25 in New York City) is a bonanza for Hadoop and big data enthusiasts – but not only because of the technical sessions and tutorials. It’s also an important gathering place for the developer community, most of whom are eager to share info from their experiences in the “trenches”.
Just to make that process easier, Cloudera is teaming up with local meetups during that week to organize a series of meetings on a variety of topics. (If for no other reason, stop into one of these meetups for a chance to grab a coveted Cloudera t-shirt.)
What’s to love about Cloudera Enterprise? A lot! But rather than bury you in documentation today, we’d rather bring you a less-than-two-minute-long video:
API access was a new feature introduced in Cloudera Manager 4.0 (download free edition here.). Although not visible in the UI, this feature is very powerful, providing programmatic access to cluster operations (such as configuration and restart) and monitoring information (such as health and metrics). This article walks through an example of setting up a 4-node HDFS and MapReduce cluster via the Cloudera Manager (CM) API.
Cloudera Manager API Basics
The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuration. The API supports HTTP Basic Authentication, accepting the same users and credentials as the Web UI. API users have the same privileges as they do in the web UI world.
For those new to it, Cloudera Manager is the first and market-leading management platform for CDH (Cloudera’s Distribution Including Apache Hadoop). Enterprise customers are coming to expect an end-to-end tool that manages the entire lifecycle of their Hadoop operations. In fact, in a recent Cloudera customer survey, an overwhelming 95% emphasized the need for this approach.
In this installment of “Meet the Engineer”, we meet with Eric Sammer (invariably known as just plain “Sammer”), Apache committer and author of the upcoming O’Reilly book, Hadoop Operations.
What do you do at Cloudera, and in which Apache project are you involved?
Cloudera Manager 4.0.4 and Cloudera Manager 3.7.8 are now available! These are enhancement releases for Cloudera Manager 4.x and Cloudera Manager 3.7.x respectively. Key enhancements include:
Cloudera Manager 4.0.4
We are pleased to announce the availability of Cloudera Manager 4.0.3. This is an enhancement release, with several improvements to configurability and usability. Some key enhancements include:
At 5 pm PDT on June 30, a leap second was added to the Universal Coordinated Time (UTC). Within an hour, Cloudera Support started receiving reports of systems running at 100% CPU utilization. The Support Team worked quickly to understand and diagnose the problem and soon published a solution. Bugs due to the leap second coupled with the Amazon Web Services outage would make this Cloudera’s busiest support weekend to date.
Since Hadoop is written in Java and closely interoperates with the underlying OS, Cloudera Support troubleshoots not only all 17 components in the Hadoop ecosystem, but also any underlying Linux and Java bugs. Last weekend many of our customers were affected by the now infamous “leap second” bugs. Initially, many assumed that Java and Linux would process the leap second gracefully. However, we soon discovered that this wasn’t the case and depending on the version of Linux being used, several distinct issues were observed.
I’m very pleased to announce the immediate General Availability of CDH4 and Cloudera Manager 4 (part of the Cloudera Enterprise 4.0 subscription). These releases are an exciting milestone for Cloudera customers, Cloudera users and the open source community as a whole.
Both CDH4 and Cloudera Manager 4 are chock full of new features. Many new features will appeal to enterprises looking to move more important workloads onto the Apache Hadoop platform. CDH4 includes high availability for the filesystem, ability to support multiple namespaces, Apache HBase table and column level security, improved performance, HBase replication and greatly improved usability and browser support for the Hue web interface. Cloudera Manager 4 includes multi-cluster and multi-version support, automation for high availability and MapReduce2, multi-namespace support, cluster-wide heatmaps, host monitoring and automated client configurations.
We are pleased to announce that Cloudera Manager 3.7.6 is now available! The most notable updates in this release are:
We’re happy to announce the Beta release of Cloudera Manager 4.0.
We are pleased to announce that Cloudera Manager 3.7.4 is now available! The most notable updates in this release are:
The Activity Monitoring feature in Cloudera Manager consolidates all Hadoop cluster activities into a single, real-time view. This capability lets you see who is running what activities on the Hadoop cluster, both at the current time and through historical activity views. Activities are either individual MapReduce jobs or those that are part of larger workflows (via Oozie, Hive or Pig).
Cloudera and Cisco jointly announced a reference architecture for running Cloudera’s Distribution Including Apache Hadoop (CDH) and Cloudera Manager on Cisco’s Unified Computing System (UCS) last November. It was the first Apache Hadoop reference architecture assembled by Cisco, and is proudly certified by Cloudera.
I bring a different perspective on the Cloudera-Cisco relationship, as I worked for over five years in Cisco on the software powering the Nexus 5000 series switches and the Cisco Virtual Interface Card. I now work at Cloudera on the HBase team, and can fully appreciate the synergies that the Cloudera and Cisco reference architecture brings to the table.
Several weeks ago, I set about to demonstrate the ease with which Solr and Map/Reduce can be integrated. I was unable to find a simple, yet comprehensive, primer on integrating the two technologies. So I set about to write one.
What follows is my bare-bones tutorial on getting Solr up and running to index each word of the complete works of Shakespeare. Note: Special thanks to Sematext for looking over the Solr bits and making sure they are sane. Check them out if you’re going to be doing a lot of work with Solr, ElasticSearch, or search in general and want to bring in the experts.
First things first
In this demo video, BC Wong, a software engineer at Cloudera, discusses the Hadoop Service Monitoring feature in Cloudera Manager. Service Monitoring helps you monitor and manage your Hadoop clusters effectively.
Through the Service Monitoring feature, customers can monitor dozens of service health and performance metrics about the overall service (HDFS, MapReduce, HBase). They can also examine underlying role instances (Namenode, Datanodes, JobTracker, TaskTrackers, Region Servers etc.) in your Hadoop cluster and see what’s going wrong – or what is about to go wrong.
In this demo, Henry Robinson, a software engineer at Cloudera, discusses the Log Management, Event Management and Alerting features in Cloudera Manager that help make sense out of all the discrete events that take place across the Hadoop cluster. He demonstrates how to search the logs valuable information, note important events that pertain to system health and create alerts to warn you when things go wrong.
Every process in a Hadoop cluster regularly writes to a log file, which captures valuable data but also creates volumes of information that is difficult to manually sort. Cloudera Manager’s comprehensive log management feature contextualizes all system logs from across the Hadoop cluster and allows the operator to search and filter by service, role, host, keyword and severity. The application also proactively scans the log files for irregularities and warns you before the Hadoop cluster is impacted.
Service and Configuration Management (Part I & II)
We’ve recently recorded a series of demo videos intended to highlight the extensive set of features and functions included with Cloudera Manager, the industry’s first end-to-end management application for Apache Hadoop. These demo videos showcase the newly enhanced Cloudera Manager interface and reveal how to use this powerful application to simplify the administration of Hadoop clusters, optimize performance and enhance the quality of service.
In the first two videos of this series, Philip Langdale, a software engineer at Cloudera, walks through Cloudera Manager’s Service and Configuration Management module. He demonstrates how simple it is to set up and configure the full range of Hadoop services in CDH (including HDFS, MR and HBase); enable security; perform configuration rollbacks; and add, delete and decommission nodes.
If you’re like a myriad of other systems administrators out there, you may be running a production Hadoop cluster, spec’ing one out, or just starting to investigate the possibility of bringing Hadoop into your workplace. As any of these folks will be able to tell you, one of the most important tasks you’ll encounter is capacity planning. With the release of Cloudera Manager 3.7, we’re bringing you a new set of tools to aid you in this process. In this post, we’ll take a look at how you can leverage Cloudera Manager to deal with some common scenarios that you might run into while planning out a Hadoop cluster.
Questions and Patterns
How is my disk usage growing over time?
Bala Venkatrao is the Director of Product Management at Cloudera.
As many of you know, we recently launched Cloudera Enterprise 3.7. Here’s the link to the press release This release marked a transition from Cloudera Management Suite (CMS) to Cloudera Manager (CM), the industry’s first and most comprehensive management application for Apache Hadoop. Over the last month we have received very positive feedback from our customers. I want to thank again all the Clouderans who spent countless hours bringing this product to market. I also want to take this opportunity to thank our customers for helping us get here, as many of them helped us to prioritize the key features for this release. Several customers have also shared the challenges/use cases from their Hadoop deployments and the need for specific features (more later) in Cloudera Manager. Many customers were actively involved in usability testing sessions for Cloudera Manager, which were immensely helpful!