It has been a while since I have blogged, primarily because we have been heads-down working toward the Cloudera Manager 4.5 release that we announced yesterday!
Cloudera Manager has seen a rapid adoption among enterprise customers and as more clusters are deployed into production environments, the more feature requests we get from them. We have heard our customers and the Cloudera Manager 4.5 release aims to address many of these requests. Kudos to the engineering team for another feature-packed release.
Some key features of CM4.5 release are as follows:
Traditionally, upgrading a Hadoop/CDH cluster has been a cumbersome exercise for some customers. With Cloudera Manager 4.5, we are addressing the challenge of platform upgrades head-on: Users can now download, distribute, and activate the latest release of CDH completely from within Cloudera Manager. Cloudera Manager does all the orchestration of making sure the right bits are deployed on every node of the Hadoop cluster. This feature is facilitated via a new packaging format that we are introducing, called “parcel“.
Coupled with the Rolling Restarts workflow, Cloudera Manager 4.5 enables the cluster to be upgraded with minimal downtime. In addition, the Rolling Restarts workflow is especially useful for propagating configuration updates for services (HDFS, HBASE, MR etc.) with minimal/zero downtime, as it cycles through the entire cluster and updates the appropriate configurations on a node-by-node basis.
Monitoring has been one of the core value adds of Cloudera Manager. Hadoop has tons of metrics but making sense of them requires a fair degree of expertise. Cloudera Manager brings Hadoop intelligence right into the tool, and makes Cloudera’s collective experience of managing several Hadoop clusters directly available to the users.
One of the gaps we noticed in previous releases was the inability to effectively correlate the various metrics across the stack, from services to roles to hosts. Customers ended up viewing the different related metrics on separate views and then tried to make sense of all of them. With 4.5’s new charting capabilities, you can now review all the relevant metrics (Services – HDFS, HBASE, MR etc, Roles – DATANODES, TT’s, RS’s and Hosts) all on the same page. So next time you are experiencing a latency spike on your HBase cluster, the advanced charts will quickly help you pinpoint the problem from within a single view – a genuine example of “breaking the silos”.
The team has also delivered a fantastic set of additional capabilities that make the lives of Hadoop administrators so much easier. For example, we have built a simple query language that enables (advanced) users to query the myriad of metrics available within Cloudera Manager to create custom charts that are relevant to their clusters. It also provides the ability to share and save these dashboards to facilitate easy sharing — both within the organization and with Cloudera Support — to more quickly troubleshoot and diagnose problems.
This feature has been a direct result of several customer requests (especially those with several hundreds of Hadoop nodes). Typically, customers start off with a homogenous set of hardware for their Hadoop clusters but over a period of time end up acquiring newer generations of hardware with different specs (CPUs, memory etc.). This mixed setup necessitates a more streamlined approach to managing groups of disparate hardware.
The Node Templating feature in 4.5 provides this ability to define specific role types and create custom templates based on these role types (example: Template.Large, Template.Medium, Template.Small, etc). These templates can then be applied to the groups of hardware. This greatly simplifies the process of adding hosts and instantiating the roles that should run on those hosts.
From the beginning, we have emphasized the need for Cloudera Manager to integrate with the broader ecosystem of IT management tools in the data center. In CM3.7, we had alerts available via SMTP. In CM4.0, we added a rich set of APIs to the application to enable integration with various popular tools like Zenoss and Nagios. With CM4.5, we are making it even easier to integrate with enterprise-management tools like IBM Tivoli, HP Openview, and others via SNMP (v2 or v3).
In addition to the above core features, we have added whole set of new capabilities around Hive managament, resource management, support for auto-provisioning of clusters on Amazon EC2, automated clusterstats, and a ton of usability improvements. More details are available here.
For current Cloudera Manager users, I am sure you are looking to upgrade to 4.5 right away! For others, Cloudera Manager is the easiest and most effective way to build your Hadoop clusters. So get started!
Bala Venkatrao is a product director at Cloudera.