Category Archives: CDH

SCM Express: Now Anyone Can Experience the Power of Apache Hadoop

Categories: CDH General

Phil Langdale is a software engineer at Cloudera and the technical lead for Cloudera’s SCM Express product.

What is SCM Express?


As powerful and useful as Apache Hadoop is, anyone who has setup up a cluster from scratch is well aware of how challenging it can be: every machine has to have the right packages installed and correctly configured so that they can all work together,

Read more

If 80% of data is unstructured, is it the exception or a new rule?

Categories: CDH Community

Ed Albanese leads business development for Cloudera. He is responsible for identifying new markets, revenue opportunities and strategic alliances for the company.

This week’s announcement about the availability of the Cloudera Connector for IBM Netezza is the achievement of a major milestone, but not necessarily the one you might expect. It’s not just the delivery of a useful software component; it’s also the introduction of a new generation of data management architectures. 

Read more

Apache HBase Do’s and Don’ts

Categories: CDH Community HBase

I recently gave a talk at the LA Hadoop User Group about Apache HBase Do’s and Don’ts. The audience was excellent and had very informed and well articulated questions. Jody from Shopzilla was an excellent host and I owe him a big thanks for giving the opportunity to speak with over 60 LA Hadoopers. Since not everyone lives in LA or could make it to the meetup, I’ve summarized some of the salient points here.

Read more

Apache Hadoop Availability

Categories: CDH General Hadoop HDFS MapReduce

A common question on the Apache Hadoop mailing lists is what’s going on with availability? This post takes a look at availability in the context of Hadoop, gives an overview of the work in progress and where things are headed.

Background

When discussing Hadoop availability people often start with the NameNode since it is a single point of failure (SPOF) in HDFS, and most components in the Hadoop ecosystem (MapReduce,

Read more