How-to: Write an EL Function in Apache Oozie

When building complex workflows in Apache Oozie, it is often useful to parameterize them so they can be reused or driven from a script, and more easily maintained. The most common method is via ${VAR} variables. For example, instead of specifying the same NameNode for all of your actions in a given workflow, you can specify something like ${myNameNode}, and then in your job.properties file, you would define it like myNameNode=hdfs://localhost:8020.

One of the advantages of that approach is that if you want to change the variable (the NameNode in this example), you only have to change it in one place and subsequently all the actions will use the new value. This can be particularly useful when testing in a dev or staging environment where you can simply change a few variables instead of editing the workflow itself.

Oozie also provides functions, called EL (Expression Language) functions, for the same purpose. A host of built-in EL functions provide things like the name of the user who submitted the workflow, the external id of a MapReduce job submitted by Oozie, concatenating two strings, and much more. In this blog post, you’ll learn how to write your own EL function.

When to Use an EL Function

An EL function should be simple, fast, and robust. One of the reasons Oozie uses a MapReduce “launcher” job to run most of its actions is so these large and complicated programs are not executed directly in the Oozie server itself. However, always keep in mind that EL functions are executed in the Oozie server, so you don’t want to overtax or threaten its stability with a poorly conceived one.

Some examples of “good” EL functions are ones that perform a simple string operation, arithmetic operation, or return some useful internal value from Oozie. It is also important that your EL function be robust; you don’t want your workflows failing because your function isn’t handling edge cases, is throwing exceptions, or something else undesirable (although doing so won’t bring down Oozie).

Always keep in mind that EL functions are executed in the Oozie server.

Examples of “bad” EL functions would be ones that create a bunch of threads for doing heavy processing, download gigabytes of data from a remote server, and so on. These actions are far too heavy or brittle, and it would be a bad idea to run them in the Oozie server. If you need to do something like this in a workflow, and possibly use the output for another action, a better alternative is to use the Shell or Java actions with the element. (See myprevious blog post, “How To: Use Oozie Shell and Java Actions“, for more information on that process.) Conversely, using the Shell or Java actions is overkill for simply concatenating two strings — whereas a simple EL function makes more sense.

Writing an EL Function

Now, let’s dive into an example of how to write your own extension EL function and how to use it in a workflow. Oozie comes with a number of string-manipulation EL functions, but what if you want to compare two strings to see if they are equal regardless of case? For example, the built-in equality function would consider “Foo” and “fOo” to not be equal — so let’s make a function that would consider them equal. (This is essentially the same as Java’s String#equalsIgnoreCase method. In fact, you’ll be using it in your function.)

No specific interface or class needs to be extended; you can simply use a basic class and define static functions in it to be your EL functions. For example, below I’ve created a class named MyELFunctions in the rkanter package and a function named equalsIgnoreCase that takes two Strings and returns a boolean:

 

(You can also find a copy of the above code on GitHub.)

As you can see, the function is fairly simple. In order to make the code more robust, it checks that the first String is not null; if it is, then the function returns true if the second string is also null or false if not. This extra check will help prevent NullPointerExceptions or other issues.

The next thing to do is compile the code (which would be in rkanter/MyELFunctions.java) into a jar file. There are many ways to compile Java code into a Jar file (Ant, Maven, NetBeans, IntelliJ, Eclipse, and so on), but here you can simply use the javac and jar commands included with the JDK. Compile the java file into a class like this:

 

And package the compiled class into a jar file like this:

 

To confirm, our jar file now contains the following:

 

Configuring Oozie to Use Your EL Function

Now that you have an EL function in a jar file, the next step is to configure Oozie to use it. The Oozie server must be restarted to notice the jar file, so first make sure that the Oozie server isn’t running.

CDH package/parcel installation:

  • Copy MyELFunctions.jar to /var/lib/oozie/ (or to /usr/lib/oozie/libext/, which is simply a symlink to the former).

Tarball installation (CDH or Apache):

  • Copy MyELFunctions.jar to /where/you/deployed/oozie/libext/ (creating libext if it doesn’t exist).
  • Run the ‘bin/oozie-setup.sh prepare-war’ command (in earlier versions of Oozie, this is simply ‘bin/oozie-setup.sh’ with no arguments).

Now that Oozie has the jar file, you have to tell Oozie how to use the EL Function. In oozie-site.xml, set the oozie.service.ELService.ext.functions.workflow property (creating it if it doesn’t already exist) as follows:

 

This property takes a comma-separated list of EL function declarations that you want to add to the original built-in EL functions. The format of a declaration is [PREFIX:]NAME=CLASS#METHOD where PREFIX can be a prefix such as “wf”, NAME is the name that you want to give to your EL function as used in your workflows, CLASS is the class name where you defined your EL Function, and METHOD is the method name of the EL Function in that CLASS. In our example, we didn’t use a prefix, we named the function function “equalsIgnoreCase”, the class is “rkanter.MyELFunctions”, and the method is “equalsIgnoreCase”.

Finally, re-start Oozie.

Using Your EL Function in a Workflow

As an example of using your EL Function in a workflow, let’s take a look at the Shell example workflow included with Oozie:

 

This workflow simply uses the shell action (“shell-node”) to echo “Hello Oozie”, then goes to a decision node (“check-output”) that will fail if the previous action didn’t actually output “Hello Oozie”. The actual check is done using the following EL Expression:

 

If we want to use our new EL function, we can modify the above EL expression using our new caseIgnoresEquals EL Function that won’t care if the shell action outputted “Hello Oozie” or something like “heLLO OoZiE”:

 

After making the above change, and submitting the modified shell example workflow, it should still succeed because our equalsIgnoreCase EL Function doesn’t care that the cases don’t match! You can verify that it’s actually working correctly by trying something that truly isn’t equal, such as “foo” and by trying “heLLo OoZiE” in the original EL expression (with the normal equals function) and seeing that the workflow fails in both cases.

Conclusion

Writing custom EL functions is a powerful way to add new capabilities to make your Oozie workflows more reusable and modular. In this blog post we only saw a very simple example of an EL function but because it’s just Java code, you are free to make much more powerful ones. We also only looked at adding EL functions to workflows, but they can also be added to coordinators. If you think other users might benefit from your EL functions, please feel free to contribute them to the community by creating a patch and uploading it to a new JIRA at http://issues.apache.org/jira/browse/OOZIE.

Further Reading:

Robert Kanter is a Software Engineer on the Platform team and an Apache Oozie Committer/PMC Member.

Filed under:

No Responses

Leave a comment


+ 5 = eight