Category Archives: Pig

Using Apache Hadoop to Find Signal in the Noise: Analyzing Adverse Drug Events

Categories: General Hadoop MapReduce Pig Use Case

Last month at the Web 2.0 Summit in San Francisco, Cloudera CEO Mike Olson presented some work the Cloudera Data Science Team did to analyze adverse drug events. We decided to share more detail about this project because it demonstrates how to use a variety of open-source tools – R, Gephi, and Cloudera’s Distribution Including Apache Hadoop (CDH) – to solve an old problem in a new way.

Read more

New Features in Apache Pig 0.8

Categories: Pig

This is a guest post contributed by Dmitriy Ryaboy (@squarecog) and was originally published in his blog on December 19th. We thought the information was valuable enough that it was worth reposting to spread the word even further. 

The Pig 0.8 release includes a large number of bug fixes and optimizations, but at the core it is a feature release. It’s been in the works for almost a full year (most of the work on 0.7 was completed by January of 2009,

Read more

Migrating to CDH

Categories: General Hadoop HBase HDFS Hive MapReduce Pig ZooKeeper

With the recent release of CDH3b2, many users are more interested than ever to try out Cloudera’s Distribution for Hadoop (CDH). One of the questions we often hear is, “what does it take to migrate?”.

Why Migrate?

If you’re not familiar with CDH3b2, here’s what you need to know.

All versions of CDH provide:

  • RPM and Debian packages for simple installation and management.
  • Clean integration with the host operating system.

Read more

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more