Cloudera Blog · Cloudera Manager Posts

How Rapleaf Works Smarter with Cloudera

Because raising the visibility of Apache Hadoop use cases is so important, in this post we bring you a re-posted story about how and why Rapleaf, a marketing data company based in San Francisco, uses Cloudera Enterprise (CDH and Cloudera Manager).

Founded in 2006, Rapleaf’s mission is to make it incredibly easy for marketers to access the data they need so they can personalize content for their customers. Rapleaf helps clients “fill in the blanks” about their customers by taking contact lists and, in real time, providing supplemental data points, statistics and aggregate charts and graphs that are guaranteed to have greater than 90% accuracy. Rapleaf is powered by Cloudera.

Business Challenges Before Cloudera

Rapleaf established itself as a data driven business early on, collecting feeds from numerous sources to create a single, accurate view of each customer. By 2008, “we were processing data in a complex pipeline that involved an organic structure of many MySQL instances and queues,” explained Rapleaf’s co-founder and vice president of engineering, Jeremy Lizt. “As data volumes increased, that structure became unmanageable and expensive. It started getting difficult to perform the kinds of operations that we wanted to be able to do. It was no secret that this wasn’t going to scale.”

New: Cloudera Manager Free Edition Demo VM

With the availability of this new demo VM containing Cloudera Manager Free Edition and CDH4.1.2 on CentOS 6.2, getting quick hands-on experience with a freeze-dried single-node Apache Hadoop cluster is just a few minutes away after the download process. 

This new addition to our growing Demo VM menagerie is available, as usual, in VMware, VirtualBox, and KVM flavors. A 64-bit host OS is required.

A few quick notes from the doc:

Cloudera Impala Beta (version 0.3) and Cloudera Manager 4.1.2 Now Available

I am pleased to announce the release of Cloudera Impala Beta (version 0.3) and Cloudera Manager 4.1.2. Key enhancements in each release are:

Cloudera Impala Beta (version 0.3)

Cloudera Impala Beta (version 0.2) and Cloudera Manager 4.1.1 Now Available

I am pleased to announce the release of Cloudera Impala Beta (version 0.2) and Cloudera Manager 4.1.1. These are both enhancement releases to make bug fixes available quickly. Key enhancements in each release are:

Cloudera Impala Beta (version 0.2)

Cloudera Manager 4.1 Now Available; Supports Impala Beta Release

I am very pleased to announce the availability of Cloudera Manager 4.1. This release adds support for the Cloudera Impala beta release, and management and monitoring of key CDH features.

Here are the highlights of Cloudera Manager 4.1:

Axemblr’s Java Client for the Cloudera Manager API

Axemblr, purveyors of a cloud-agnostic MapReduce Web Service, have recently announced the availability of an Apache-licensed Java Client for the Cloudera Manager API.

The task at hand, according to Axemblr, is to ”deploy Hadoop on Cloud with as little user interaction as possible. We have the code to provision the hosts but we still need to install and configure Hadoop on all nodes and make it so the user has a nice experience doing it.” And voila, the answer is Cloudera Manager, with the process made easy via the REST API introduced in Release 4.0.

Thus, says Axemblr: “In the pursuit of our greatest desire (second only to coffee early in the morning), we ended up writing a Java client for Cloudera Manager’s API. Thus we achieved to automate a CDH3 Hadoop installation on Amazon EC2 and Rackspace Cloud. We also decided to open source the client so other people can play along.”

How-to: Set Up an Apache Hadoop/Apache HBase Cluster on EC2 in (About) an Hour

Today we bring you one user’s experience using Apache Whirr to spin up a CDH cluster in the cloud. This post was originally published here by George London (@rogueleaderr) based on his personal experiences; he has graciously allowed us to bring it to you here as well in a condensed form. (Note: the configuration described here is intended for learning/testing purposes only.)

I’m going to walk you through a (relatively) simple set of steps that will get you up and running MapReduce programs on a cloud-based, six-node distributed Apache Hadoop/Apache HBase cluster as fast as possible. This is all based on what I’ve picked up on my own, so if you know of better/faster methods, please let me know in comments!

We’re going to be running our cluster on Amazon EC2, and launching the cluster using Apache Whirr and configuring it using Cloudera Manager Free Edition.  Then we’ll run some basic programs I’ve posted on Github that will parse data and load it into Apache HBase.

Videos: Get Started with Hadoop Using Cloudera Enterprise

Our video animation factory has been busy lately. The embedded player below contains our two latest ones stitched together:

Get Started with Hadoop Using Cloudera Enterprise, Part 1 

Meet the Engineer: Jon Natkins

In this installment of “Meet the Engineers”, meet Jonathan Natkins,  also known as “Natty” by his friends and colleagues. 

What do you do at Cloudera, and in which Apache project are you involved?

For the last year and a half, I’ve been an engineer on the Enterprise team. We’re the guys who build Cloudera Manager, and all the goodies that make it easy to manage and administer Apache Hadoop clusters. Specifically, I’ve worked on a number of things across the product, like scale and performance for the databases underlying the various monitoring tools available in the Enterprise edition of Cloudera Manager. I’ve also worked extensively on our operational reporting and HDFS file search capabilities. While I don’t work full-time on any of the Apache projects, I have been known to contribute to Apache Hive and Hadoop on rainy days.

Community Meetups at Strata + Hadoop World 2012

Strata Conference + Hadoop World (Oct. 23-25 in New York City) is a bonanza for Hadoop and big data enthusiasts – but not only because of the technical sessions and tutorials. It’s also an important gathering place for the developer community, most of whom are eager to share info from their experiences in the “trenches”.

Just to make that process easier, Cloudera is teaming up with local meetups during that week to organize a series of meetings on a variety of topics. (If for no other reason, stop into one of these meetups for a chance to grab a coveted Cloudera t-shirt.)

As you can see, these meetups are highly parallel, so you will either have to make careful choices or have very quick feet. The good news is: there’s something for everybody.

Newer Posts Older Posts