How-to: Import a Pre-existing Oozie Workflow into Hue

Hue is an open-source web interface for Apache Hadoop packaged with CDH that focuses on improving the overall experience for the average user. The Apache Oozie application in Hue provides an easy-to-use interface to build workflows and coordinators. Basic management of workflows and coordinators is available through the dashboards with operations such as killing, suspending, or resuming a job.

Prior to Hue 2.2 (included in CDH 4.2), there was no way to manage workflows within Hue that were created outside of Hue. As of Hue 2.2, importing a pre-existing Oozie workflow by its XML definition is now possible.

How to import a workflow

Importing a workflow is pretty straightforward. All it requires is the workflow definition file and access to the Oozie application in Hue. Follow these steps to import a workflow:

  1. Go to Oozie Editor/Dashboard > Workflows and click the “Import” button.

     

     

  2. Provide at minimum a name and workflow definition file.

     

     

     

  3. Click “Save”. This will redirect you to the workflow builder with a message in blue near the top stating “Workflow imported”.

     

     

How It Works

The definition file describes a workflow well enough for Hue to infer its structure. It also provides the majority of the attributes associated with a node, with the exception of some resource references. Resource reference handling is detailed in the following paragraphs.

A workflow is imported into Hue by uploading the XML definition. Its nodes are transformed into Django serialized objects, and then grok’d by Hue:


Workflow transformation pipeline (Without hierarchy resolution)

Workflow Definitions Transformation

Workflow definitions have a general form, which make them easy to transform. There are several kinds of nodes, all of which have a unique representation. There are patterns that simplify the task of transforming the definition XML:

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="fs-test">
  <start to="Fs" />
  <action name="Fs">
    <fs>
      <delete path="${nameNode}${output}/testfs" />
      <mkdir path="${nameNode}${output}/testfs" />
      <mkdir path="${nameNode}${output}/testfs/source" />
      <move source="${nameNode}${output}/testfs/source" target="${nameNode}${output}/testfs/renamed" />
      <chmod path="${nameNode}${output}/testfs/renamed" permissions="700" dir-files="false" />
      <touchz path="${nameNode}${output}/testfs/new_file" />
    </fs>
    <ok to="end" />
    <error to="kill" />
  </action>
  <kill name="kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="end" />
</workflow-app>

 

Nodes are children of the root element workflow-app. Every node has a unique representation varying in at least their name. Every action is defined by the element action with a unique name. Its immediate children consist of the action type and links. The children of the node type tag are various properties associated with the action. The start, end, fork, decision, join, and kill nodes have their own transformation, where actions are transformed using a general Extensible Stylesheet Language Transformation, or XSLT.

The different attributes are generally not unique to an action. For instance, the Hive action and Sqoop action both have the prepare attribute. Hue provides an XSLT for every action type, but only to import non-unique attributes and to define transformations for unique attributes. In the XSLT below, the sqoop action is defined by importing all of the general fields and defining any Sqoop-specific fields:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:workflow="uri:oozie:workflow:0.1" xmlns:sqoop="uri:oozie:sqoop-action:0.2" version="1.0" exclude-result-prefixes="workflow sqoop">
  <xsl:import href="../nodes/fields/archives.xslt" />
  <xsl:import href="../nodes/fields/files.xslt" />
  <xsl:import href="../nodes/fields/job_properties.xslt" />
  <xsl:import href="../nodes/fields/job_xml.xslt" />
  <xsl:import href="../nodes/fields/params.xslt" />
  <xsl:import href="../nodes/fields/prepares.xslt" />
  <xsl:template match="sqoop:sqoop">
    <object model="oozie.sqoop" pk="0">
      <xsl:call-template name="archives" />
      <xsl:call-template name="files" />
      <xsl:call-template name="job_properties" />
      <xsl:call-template name="job_xml" />
      <xsl:call-template name="params" />
      <xsl:call-template name="prepares" />
      <field name="script_path" type="CharField">
        <xsl:value-of select="*[local-name()='command']" />
      </field>
    </object>
  </xsl:template>
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
</xsl:stylesheet>

 

The above XSLT imports transformation definitions for the archives, files, job properties, job XML, params, and prepares attributes. If a Sqoop action XML definition were to be transformed by the above XSLT, the resulting XML would take on the following form:

<object model="oozie.sqoop" pk="0">
  <field name="archives" type="TextField">...</field>
  <field name="files" type="TextField">...</field>
  <field name="job_properties" type="TextField">...</field>
  <field name="job_xml" type="TextField">...</field>
  <field name="params" type="TextField">...</field>
  <field name="prepares" type="TextField">...</field>
  <field name="script_path" type="CharField">...</field>
</object>

 

Workflow Structure Resolution

The structure of the workflow is created after the nodes are imported. Internally, the workflow hierarchy is represented as a set of “links” between nodes. The workflow definition contains references to next nodes in the graph through the tags ok, error, and start. These references are used to create transitions. The following code snippet illustrates a transition that goes to a node called end and an error transition that goes to a node named kill:

<ok to="end" />
<error to="kill" />

 

Workflow definitions do not have resources, such as a jar file used when running a MapReduce action. Hue intentionally leaves this information out when performing the transformation because it is not in the workflow definition. This forces users to update any resource-specific information within actions.


An imported workflow. Note that its resource information is missing.

Summary and Next Steps

Hue can manage workflows with its dynamic workflow builder and now, officially, can import predefined workflows into its system. Another benefit of parsing the XML definition is it enables all workflows to be displayed as a graph in the dashboard:


Dashboard graph of an imported workflow

The workflow import process is good, but not perfect yet. Ideally, as detailed above, resources will be found on the system and validated before being imported or resources should be optionally provided.

Have any suggestions? Feel free to tell us what you think via hue-user.

Abraham Elmahrek is a Software Engineer on the Platform team.

Filed under:

2 Responses

Leave a comment


seven − 4 =