Cloudera Blog · Cloudera Manager Posts
We are pleased to announce that Cloudera Manager 3.7.4 is now available! The most notable updates in this release are:
The Activity Monitoring feature in Cloudera Manager consolidates all Hadoop cluster activities into a single, real-time view. This capability lets you see who is running what activities on the Hadoop cluster, both at the current time and through historical activity views. Activities are either individual MapReduce jobs or those that are part of larger workflows (via Oozie, Hive or Pig).
Activity Monitoring provides many statistics – both in tabular displays and charts – about the resources used by individual Hadoop jobs and at the aggregate cluster level. The Comparison feature in Activity Monitoring shows the performance of the selected Hadoop job compared with the performance of other similar Hadoop jobs.
Cloudera and Cisco jointly announced a reference architecture for running Cloudera’s Distribution Including Apache Hadoop (CDH) and Cloudera Manager on Cisco’s Unified Computing System (UCS) last November. It was the first Apache Hadoop reference architecture assembled by Cisco, and is proudly certified by Cloudera.
I bring a different perspective on the Cloudera-Cisco relationship, as I worked for over five years in Cisco on the software powering the Nexus 5000 series switches and the Cisco Virtual Interface Card. I now work at Cloudera on the HBase team, and can fully appreciate the synergies that the Cloudera and Cisco reference architecture brings to the table.
I know that Cisco is actively investing in development and support for the rack servers and switches that comprise UCS. Working at Cloudera now, I see the tireless effort that goes into each CDH and Cloudera Manager release from leaders in the Apache Hadoop community.
Several weeks ago, I set about to demonstrate the ease with which Solr and Map/Reduce can be integrated. I was unable to find a simple, yet comprehensive, primer on integrating the two technologies. So I set about to write one.
What follows is my bare-bones tutorial on getting Solr up and running to index each word of the complete works of Shakespeare. Note: Special thanks to Sematext for looking over the Solr bits and making sure they are sane. Check them out if you’re going to be doing a lot of work with Solr, ElasticSearch, or search in general and want to bring in the experts.
First things first
The way that I got started was by instantiating a new CentOS 6 Virtual Machine. You can pick a different flavor of Linux if that suits you; Hadoop should work fine on any (though advocated distros are SuSE, Ubuntu/Debian, RedHat/CentOS).
In this demo video, BC Wong, a software engineer at Cloudera, discusses the Hadoop Service Monitoring feature in Cloudera Manager. Service Monitoring helps you monitor and manage your Hadoop clusters effectively.
Through the Service Monitoring feature, customers can monitor dozens of service health and performance metrics about the overall service (HDFS, MapReduce, HBase). They can also examine underlying role instances (Namenode, Datanodes, JobTracker, TaskTrackers, Region Servers etc.) in your Hadoop cluster and see what’s going wrong – or what is about to go wrong.
Service Monitoring presents health and performance data in a variety of formats including interactive charts through Cloudera’s new, enhanced user interface. Every Service Monitoring page also includes a widget to enable quick search for relevant Events and Logs associated with the service under consideration. Important Event and Log messages are also highlighted in the various charts. You can also monitor metrics against customizable thresholds, which results in Alerts that operators can pay attention to.
In this demo, Henry Robinson, a software engineer at Cloudera, discusses the Log Management, Event Management and Alerting features in Cloudera Manager that help make sense out of all the discrete events that take place across the Hadoop cluster. He demonstrates how to search the logs valuable information, note important events that pertain to system health and create alerts to warn you when things go wrong.
Every process in a Hadoop cluster regularly writes to a log file, which captures valuable data but also creates volumes of information that is difficult to manually sort. Cloudera Manager’s comprehensive log management feature contextualizes all system logs from across the Hadoop cluster and allows the operator to search and filter by service, role, host, keyword and severity. The application also proactively scans the log files for irregularities and warns you before the Hadoop cluster is impacted.
With event management, Cloudera Manager proactively reports on important events in the Hadoop cluster such as a change in service health or metrics, log messages with a certain severity or keyword, or abnormal job performance. It creates and aggregates these relevant Hadoop events, and makes them available for searching and alerting.
Service and Configuration Management (Part I & II)
We’ve recently recorded a series of demo videos intended to highlight the extensive set of features and functions included with Cloudera Manager, the industry’s first end-to-end management application for Apache Hadoop. These demo videos showcase the newly enhanced Cloudera Manager interface and reveal how to use this powerful application to simplify the administration of Hadoop clusters, optimize performance and enhance the quality of service.
In the first two videos of this series, Philip Langdale, a software engineer at Cloudera, walks through Cloudera Manager’s Service and Configuration Management module. He demonstrates how simple it is to set up and configure the full range of Hadoop services in CDH (including HDFS, MR and HBase); enable security; perform configuration rollbacks; and add, delete and decommission nodes.
Part I of the Service and Configuration Management demo focuses on managing services and configuring a cluster for optimal performance. It also demonstrates how to administer users within Cloudera Manager, configure role-based permissions, and better manage security.
If you’re like a myriad of other systems administrators out there, you may be running a production Hadoop cluster, spec’ing one out, or just starting to investigate the possibility of bringing Hadoop into your workplace. As any of these folks will be able to tell you, one of the most important tasks you’ll encounter is capacity planning. With the release of Cloudera Manager 3.7, we’re bringing you a new set of tools to aid you in this process. In this post, we’ll take a look at how you can leverage Cloudera Manager to deal with some common scenarios that you might run into while planning out a Hadoop cluster.
Questions and Patterns
How is my disk usage growing over time?
One very interesting disk usage pattern can be seen in Josh’s recent blog post on his analysis of drug interactions. Josh started with a relatively small data set, containing about one million records. However, during one of the stages of his analytic process, the number of records was blown up from one million to three trillion. Many types of analyses can result in very large intermediate data sets, while the final output may just be a fraction of the intermediate data. The consequence is that there are temporary spikes in disk usage, which need to be understood, in order to appropriately plan out a Hadoop deployment.
Bala Venkatrao is the Director of Product Management at Cloudera.
As many of you know, we recently launched Cloudera Enterprise 3.7. Here’s the link to the press release This release marked a transition from Cloudera Management Suite (CMS) to Cloudera Manager (CM), the industry’s first and most comprehensive management application for Apache Hadoop. Over the last month we have received very positive feedback from our customers. I want to thank again all the Clouderans who spent countless hours bringing this product to market. I also want to take this opportunity to thank our customers for helping us get here, as many of them helped us to prioritize the key features for this release. Several customers have also shared the challenges/use cases from their Hadoop deployments and the need for specific features (more later) in Cloudera Manager. Many customers were actively involved in usability testing sessions for Cloudera Manager, which were immensely helpful!
At Cloudera, we strive hard to listen to our customers and help build products to address their needs. We hold regular meetings with customers, sharing early design prototypes and feature ideas and then quickly iterate on the feedback we receive. Cloudera Manager has been a result of this amazing collaboration with our customers and we look forward to this continued partnership as we build on our vision to make it even easier for our customers to manage their Hadoop environments.