Category Archives: CDH

MapReduce 2.0 in Apache Hadoop 0.23

Categories: CDH General Hadoop MapReduce

In Building and Deploying MR2 we presented a brief introduction to MapReduce in Apache Hadoop 0.23 and focused on the steps to set up a single-node cluster. This blog provides developers with architectural details of the new MapReduce design. 

Apache Hadoop 0.23 has major improvements over previous releases. Here are a few highlights on the MapReduce front; note that there are also major HDFS improvements, which are out of scope of this post.

Read more

Introducing CDH4

Categories: CDH General

I’m pleased to inform our users and customers that Cloudera has released its 4th version of Cloudera’s Distribution Including Apache Hadoop (CDH) into beta today. This release combines the input from our enterprise customers, partners and users with the hard work of Cloudera engineering and the larger Apache open source community to create what we believe is a compelling advance for this widely adopted platform.

There are a great many improvements and new capabilities in CDH4 compared to CDH3.

Read more

Cloudera Connector for Tableau Has Been Released

Categories: CDH Hive

Earlier today, Cloudera proudly released the Cloudera Connector for Tableau. The availability of this connector serves both Tableau users who are looking to expand the volume of datasets they manipulate and Hadoop users who want to enable analysts like Tableau users to make the data within Hadoop more meaningful. Enterprises can now extract the full value of big data and allow a new class of power users to interact with Hadoop data in ways they priorly could not.

Read more

CDH3, update 3 now available

Categories: CDH General

Keeping with our release policy for Cloudera’s Distribution Including Apache Hadoop (CDH) I’m pleased to announce the availability of update 3 for CDH3.  As a reminder, we ship updates for our most recent GA distribution every 3 months.  Updates primarily include bug fixes but when possible we will also include features from our mid-term roadmap.  We’ll only include new features when they do not introduce instability or break compatibility.  As always, users have the option to skip updates without incurring any future upgrade cost.

Read more

Seismic Data Science: Reflection Seismology and Hadoop

Categories: CDH General Hadoop Use Case

When most people first hear about data science, it’s usually in the context of how prominent web companies work with very large data sets in order to predict clickthrough rates, make personalized recommendations, or analyze UI experiments. The solutions to these problems require expertise with statistics and machine learning, and so there is a general perception that data science is intimately tied to these fields. However, in my conversations at academic conferences and with Cloudera customers,

Read more