Cloudera Developer Blog · Hadoop Posts

Getting MapReduce 2 Up to Speed

Thanks to the improvements described here, CDH 5 will ship with a version of MapReduce 2 that is just as fast (or faster) than MapReduce 1.

Performance fixes are tiny, easy, and boring, once you know what the problem is. The hard work is in putting your finger on that problem: narrowing, drilling down, and measuring, measuring, measuring.

Cloudera Enterprise 5 Beta 2 is Available: More New Features and Components

Cloudera has released the Beta 2 version of Cloudera Enterprise 5 (comprises CDH 5.0.0 and Cloudera Manager 5.0.0). 

This release (download) contains a number of new features and component versions including the ones below:

This Month in the Ecosystem (January 2014)

Welcome to our fifth edition of “This Month in the Ecosystem,” a digest of highlights from January 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

How-to: Write and Run Giraph Jobs on Hadoop

Create a test environment for writing and testing Giraph jobs, or just for playing around with Giraph and small sample datasets.

Apache Giraph is a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes. Giraph is in use at companies like Facebook and PayPal, for example, to help represent and analyze the billions (or even trillions) of connections across massive datasets. Giraph was inspired by Google’s Pregel framework and integrates well with Apache Accumulo, Apache HBase, Apache Hive, and Cloudera Impala.

How-to: Create a Simple Hadoop Cluster with VirtualBox

Set up a CDH-based Hadoop cluster in less than an hour using VirtualBox and Cloudera Manager.

Thanks to Christian Javet for his permission to republish his blog post below!

NYU, Analytics, and Cloudera’s QuickStart VM

The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.

In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.

This Month (and Year) in the Ecosystem (December 2013)

Welcome to our sixth edition of “This Month in the Ecosystem,” a digest of highlights from December 2013 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

The Cloudera Developer Newsletter: It’s For You!

The new Cloudera Developer Newsletter makes its debut in January 2014.

Developers and data scientists, we’re realize you’re special – as are operators and analysts, in their own particular ways. 

Developer Happy Hour with Cloudera: Building Hadoop 2 Applications

Join us at Cloudera’s San Francisco office on Feb. 20 for tech talks, T-shirts, and adult refreshments!

As an extension of the DeveloperWeek Conf & Festival 2014 experience in San Francisco next month, join us at Cloudera’s San Francisco office for a Developer Happy Hour (beer + tech talks), focusing on Apache Hadoop 2 application development. Anyone (attendees or non) is free to attend, but RSVP now because seats (and “Data is the New Bacon” T-shirts) are limited!

The Hadoop FAQ for Oracle DBAs

Oracle DBAs, get answers to many of your most common questions about getting started with Hadoop.

As a former Oracle DBA, I get a lot of questions (most welcome!) from current DBAs in the Oracle ecosystem who are interested in Apache Hadoop. Here are few of the more frequently asked questions, along with my most common replies.

Newer Posts Older Posts