Cloudera Developer Blog

Big Data best practices, how-to's, and internals from Cloudera Engineering and the community


Getting MapReduce 2 Up to Speed

Thanks to the improvements described here, CDH 5 will ship with a version of MapReduce 2 that is just as fast (or faster) than MapReduce 1.

Performance fixes are tiny, easy, and boring, once you know what the problem is. The hard work is in putting your finger on that problem: narrowing, drilling down, and measuring, measuring, measuring.

Best Practices for Deploying Cloudera Enterprise on Amazon Web Services

This FAQ contains answers to the most frequently asked questions about the architecture and configuration choices involved.

In December 2013, Cloudera and Amazon Web Services (AWS) announced a partnership to support Cloudera Enterprise on AWS infrastructure. Along with this announcement, we released a Deployment Reference Architecture Whitepaper. In this post, you’ll get answers to the most frequently asked questions about the architecture and the configuration choices that have been highlighted in that whitepaper.

Cloudera Enterprise 5 Beta 2 is Available: More New Features and Components

Cloudera has released the Beta 2 version of Cloudera Enterprise 5 (comprises CDH 5.0.0 and Cloudera Manager 5.0.0). 

This release (download) contains a number of new features and component versions including the ones below:

Migrating from Hive CLI to Beeline: A Primer

Migrating from the Hive CLI to Beeline isn’t as simple as changing the executable name, but this post makes it easy nonetheless.

In its original form, Apache Hive was a heavyweight command-line tool that accepted queries and executed them utilizing MapReduce. Later, the tool split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).

This Month in the Ecosystem (January 2014)

Welcome to our fifth edition of “This Month in the Ecosystem,” a digest of highlights from January 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

How-to: Write and Run Giraph Jobs on Hadoop

Create a test environment for writing and testing Giraph jobs, or just for playing around with Giraph and small sample datasets.

Apache Giraph is a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes. Giraph is in use at companies like Facebook and PayPal, for example, to help represent and analyze the billions (or even trillions) of connections across massive datasets. Giraph was inspired by Google’s Pregel framework and integrates well with Apache Accumulo, Apache HBase, Apache Hive, and Cloudera Impala.

Spark is Now Generally Available for Cloudera Enterprise

Cloudera is announcing the general availability of support for Spark, bringing interactive machine learning and stream processing to enterprise data hubs.

Cloudera is pleased to announce the immediate availability of its first release of Apache Spark for Cloudera Enterprise (comprising CDH and Cloudera Manager).

How Wajam Answers Business Questions Faster With Hadoop

Thanks to Xavier Clements of Wajam for allowing us to re-publish his blog post about Wajam’s Hadoop experiences below!

Wajam is a social search engine that gives you access to the knowledge of your friends. We gather your friends’ recommendations from Facebook, Twitter, and other social platforms and serve these back to you on supported sites like Google, eBay, TripAdvisor, and Wikipedia.

How-to: Create a Simple Hadoop Cluster with VirtualBox

Set up a CDH-based Hadoop cluster in less than an hour using VirtualBox and Cloudera Manager.

Thanks to Christian Javet for his permission to republish his blog post below!

Pro Tips for Pitching an HBaseCon Talk

These suggestions from the Program Committee offer an inside track to getting your talk accepted!

With HBaseCon 2014 (in San Francisco on May 5) Call for Papers closing in just over three weeks (on Feb. 14 — sooner than you think), there’s no better time than “now” to start thinking about your proposal.

Newer Posts Older Posts