Category Archives: Oozie

How To: Use Oozie Shell and Java Actions

Categories: General How-to Oozie Pig

Ed. Note (Oct. 16, 2015): This post has been updated for CDH 5.x; some external links have been updated as well.

Apache Oozie, the workflow coordinator for Apache Hadoop, has actions for running MapReduce, Apache Hive, Apache Pig, Apache Sqoop, and Distcp jobs; it also has a Shell action and a Java action. These last two actions allow us to execute any arbitrary shell command or Java code,

Read More

What’s New in Hue 2.2?

Categories: CDH General Hue Oozie

This post is about the new release of Hue, an open source web-based interface that makes Apache Hadoop easier to use, that’s included in CDH4.2.

Hue lets you interact with Hadoop services from within your browser without having to go to a command-line interface. It features a file browser for HDFS, an Apache Oozie Application for creating workflows of data processing jobs, a job designer/browser for MapReduce,

Read More

How-To: Schedule Recurring Hadoop Jobs with Apache Oozie

Categories: Guest Hive Oozie

Our thanks to guest author Jon Natkins (@nattyice) of WibiData for the following post!

Today, many (if not most) companies have ETL or data enrichment jobs that are executed on a regular basis as data becomes available. In this scenario it is important to minimize the lag time between data being created and being ready for analysis.

CDH, Cloudera’s open-source distribution of Apache Hadoop and related projects,

Read More

Apache Hadoop in 2013: The State of the Platform

Categories: Avro CDH Flume Hadoop HBase HDFS Hive Hue Impala Mahout MapReduce Oozie Pig Sqoop YARN ZooKeeper

For several good reasons, 2013 is a Happy New Year for Apache Hadoop enthusiasts.

In 2012, we saw continued progress on developing the next generation of the MapReduce processing framework (MRv2), work that will bear fruit this year. HDFS experienced major progress toward becoming a lights-out, fully enterprise-ready distributed filesystem with the addition of high availability features and increased performance. And a hint of the future of the Hadoop platform was provided with the Beta release of Cloudera Impala,

Read More