What’s New in CDH4.1 Hue

Hue is a Web-based interface that makes it easier to use Apache Hadoop. Hue 2.1 (included in CDH4.1) provides a new application on top of Apache Oozie (a workflow scheduler system for Apache Hadoop) for creating workflows and scheduling them repetitively. For example, Hue makes it easy to group a set of MapReduce jobs and Hive scripts and run them every day of the week.

In this post, we’re going to focus on the Workflow component of the new application.

Workflow Editor

Workflows consist of one or multiple actions that can be executed sequentially or in parallel. Each action will run a program that can be configured with parameters (e.g. output=${OUTPUT} instead of hardcoding a directory path) in order to be easily reusable.

The current types of programs are:

  • MapReduce
  • Pig
  • Hive
  • Sqoop
  • Java
  • Shell
  • Ssh
  • Streaming jobs
  • DistCp

The application comes with a set of examples:


Workflows can be shared with other users and cloned. Forks are supported and enable actions to run at the same time. The Workflow Editor lets you compose your workflow.

Let’s take the Sequential Java (aka TeraSort) example and add an Hive action, HiveGen, that will generate some random data. TeraGen is a MapReduce job doing the same thing and both actions will run in parallel. Finally, the TeraSort action will read both outputs and sort them together You can see how this would look in Hue via the screenshot below.

Workflow Dashboard

Our TeraGen workflow can then be submitted and controlled in the Dashboard. Parameters values (e.g. ${OUTPUT} of the output path of the TeraSort action) are prompted when clicking on the submit button.

Jobs can be filtered/killed/restarted and detailed information (progress, logs) is available within the application and in the Job Browser Application.

Individual management of a workflow can be done on its specific page. We can see the active actions in orange below:

Summary

Before CDH4.1, Oozie users had to deal with XML files and command line programs. Now, this new application allows users to build, monitor and control their workflows within a single Web application. Moreover, the Hue File Browser (for listing and uploading workflows) and Job Browser (for accessing fine grained details of the jobs) are leveraged.

The next version of the Oozie application will focus on improving the general experience, increasing the number of supported Oozie workflows and prettifying the Editor.

In the meantime, feel free to report feedback and wishes to hue-user!

Filed under:

No Responses

Leave a comment


− 5 = zero