Cloudera Developer Blog · Cloudera Manager Posts
Today Cloudera announced a new Cloudera Academic Partnership program, in which participating universities worldwide get access to curriculum, training, certification, and software.
As noted in the press release, the global demand for people with Apache Hadoop and data science skills is dwarfing all supply. We consider it an important mission to help accredited universities meet that demand, by equipping them with the content and training they need to educate students in the Hadoop arts.
Furthermore, we are cognizant of the fact that many academic research labs are in need of tools to help deploy, manage, and extend Hadoop clusters. For that reason, CAP members get free access to Cloudera Manager Enterprise Edition for 12 months to support data-intensive testing, development, and research.
This guest post comes to us from David Greco, CTO of Eligotech.
Vagrant is a very nice tool for programmatically managing many virtual machines (VMs) on a single physical machine. It natively supports VirtualBox and also provides plugins for VMware Fusion and Amazon EC2, supporting the management of VMs in those environments as well.
Vagrant provides a very easy-to-use, Ruby-based internal DSL that allows the user to define one or more virtual machines together with their configuration parameters. Furthermore, it offers different mechanisms for automatic provisioning: You can use Puppet, Chef, or shell scripts for automating software installation and configuration on the machines defined in the Vagrant configuration file.
Cloudera Manager includes a new express installation wizard for Amazon Web Services (AWS) EC2. Its goal is to enable Cloudera Manager users to provision CDH clusters and Cloudera Impala (the open source distributed query engine for Apache Hadoop) on EC2 as easily as possible (for testing and development purposes only, not supported for production workloads) - and thus is currently the fastest way to provision a Cloudera Manager-managed cluster in EC2.
The new distinguishing feature introduced in version 4.5 is that Cloudera Manager can now launch and configure the instances for you, so you don’t have to worry about launching the instances, authorizing SSH keys, and configuring a firewall. All this can now be done from within Cloudera Manager!
Since Cloudera Manager and the nodes running CDH use internal hostnames to communicate, the Cloudera Manager server must run on EC2 as well. In fact, the Cloud Express Wizard only appears when installing Cloudera Manager on EC2.
Last week Cloudera released the 4.5 release of Cloudera Manager, the leading framework for end-to-end management of Apache Hadoop clusters. (Download Cloudera Manager here, and see install instructions here.) Among many other features, Cloudera Manager 4.5 adds support for Apache Hive. In this post, I’ll explain how to set up a Hive server for use with Cloudera Manager 4.5 (and later).
For details about other new features in this release, please see the full release notes:
It has been a while since I have blogged, primarily because we have been heads-down working toward the Cloudera Manager 4.5 release that we announced yesterday!
Cloudera Manager has seen a rapid adoption among enterprise customers and as more clusters are deployed into production environments, the more feature requests we get from them. We have heard our customers and the Cloudera Manager 4.5 release aims to address many of these requests. Kudos to the engineering team for another feature-packed release.
Some key features of CM4.5 release are as follows:
Today is an exciting day for Cloudera customers and users. With an update to our 100% open source platform and a number of new add-on products, every software component we ship is getting either a minor or major update. There’s a lot to cover and this blog post is only a summary. In the coming weeks we’ll do follow-on blog posts that go deeper into each of these releases.
We’re now supporting several hundred production Hadoop clusters. In doing so we’ve had to make a lot of advances in the functionality, reliability and manageability of the Hadoop platform. Even with these improvements, customers have been traditionally reluctant to run certain data and applications on the Apache Hadoop platform. The new products we are announcing today were designed to remove these obstacles to adoption.
You may have seen the recent announcement from Skytap about the availability of pre-configured CDH4 templates in the Skytap Cloud public template library. So for anyone who wants to try out a Cloudera Hadoop cluster—from small to large—it can now be easily accomplished in Skytap Cloud. The how-to below from Skytap’s Matt Sousely explains how.
The goal of this how-to will be to spin up a 10-node Cloudera Hadoop cluster in Skytap Cloud. To begin, let’s talk about the two new Cloudera Hadoop cluster templates. The first is Cloudera CDH4 Hadoop cluster: a 2-node Hadoop cluster template. It includes 2 nodes and a management node/server. The second is the Cloudera CDH4 Hadoop host template. This second template is not intended to run by itself in a configuration—rather, it contains a host VM that is ready to become another Hadoop node in the Cloudera CDH4 Hadoop cluster template-based configuration.
To start, let’s spin up a Cloudera Hadoop cluster.
- Log in to Skytap Cloud
- Choose the Templates tab
- In the search box, type hadoop
- Select Cloudera CDH4 Hadoop cluster
- Click New Configuration
- Click Run
I am pleased to announce the release of Cloudera Impala Beta (version 0.4) and Cloudera Manager 4.1.3. Key enhancements in each release are:
Cloudera Impala Beta (version 0.4)
Because raising the visibility of Apache Hadoop use cases is so important, in this post we bring you a re-posted story about how and why Rapleaf, a marketing data company based in San Francisco, uses Cloudera Enterprise (CDH and Cloudera Manager).
Founded in 2006, Rapleaf’s mission is to make it incredibly easy for marketers to access the data they need so they can personalize content for their customers. Rapleaf helps clients “fill in the blanks” about their customers by taking contact lists and, in real time, providing supplemental data points, statistics and aggregate charts and graphs that are guaranteed to have greater than 90% accuracy. Rapleaf is powered by Cloudera.
Business Challenges Before Cloudera
Rapleaf established itself as a data driven business early on, collecting feeds from numerous sources to create a single, accurate view of each customer. By 2008, “we were processing data in a complex pipeline that involved an organic structure of many MySQL instances and queues,” explained Rapleaf’s co-founder and vice president of engineering, Jeremy Lizt. “As data volumes increased, that structure became unmanageable and expensive. It started getting difficult to perform the kinds of operations that we wanted to be able to do. It was no secret that this wasn’t going to scale.”
With the availability of this new demo VM containing Cloudera Manager Free Edition and CDH4.1.2 on CentOS 6.2, getting quick hands-on experience with a freeze-dried single-node Apache Hadoop cluster is just a few minutes away after the download process.
This new addition to our growing Demo VM menagerie is available, as usual, in VMware, VirtualBox, and KVM flavors. A 64-bit host OS is required.
A few quick notes from the doc: