How-to: Use the Apache Oozie REST API

Apache Oozie has a Java client and a Java API for submitting and monitoring jobs, but what if you want to use Oozie from another language or a non-Java system? Oozie provides a Web Services API, which is an HTTP REST API. That is, you can do anything with Oozie simply by making requests to the Oozie server over HTTP. In fact, this is how the Oozie client and Oozie Java API themselves talk to the Oozie server. 

In this how-to, I’ll explain how the REST API works.

What is REST?

REST (Representational State Transfer) is a stateless architectural style for a client and server to communicate over HTTP. The client typically makes HTTP requests and the server sends back an HTTP response. The Oozie server accepts GET, PUT, and POST requests depending on the command. GET is typically used for commands that are querying the server for information and don’t have any side-effects (e.g. asking for a list of jobs). PUT is typically used for commands that are changing an already existing job (e.g. suspending a job). And POST is used for submitting a job. 

JSON (JavaScript Object Notation) is a standard typically used for sending structured data over a network (like XML, but is considered more human-readable). Like many REST APIs, the data that the Oozie server sends back is in JSON format to make them easy to parse. There are a few exceptions though: logs are returned as plain text, workflow/coordinator/bundle definitions are returned as XML, and job graphs are returned as png images.

How to Make a REST Request

Most languages provide a library or other method of making a REST request; anything that can make an HTTP connection should work. For example, we can use Python:

$ cat list_jobs.py
import urllib2
import json

req = urllib2.Request('http://localhost:11000/oozie/v1/jobs?jobtype=wf')
response = urllib2.urlopen(req)
output = response.read()
print json.dumps(json.loads(output), indent=4, separators=(',', ': '))

 

The python code makes a GET request to http://localhost:11000/oozie/v1/jobs and passes the parameter “jobtype” with a value of “wf”, which is the REST API call for returning a list of Oozie workflow jobs. It then dumps the output to the console, as we can see below:

$ python list_jobs.py
{
    "offset": 1,
    "total": 1,
    "len": 50,
    "workflows": [
        {
            "status": "SUCCEEDED",
            "run": 0,
            "startTime": "Wed, 22 May 2013 21:28:54 GMT",
            "appName": "no-op-wf",
            "lastModTime": "Wed, 22 May 2013 21:28:54 GMT",
            "actions": [],
            "acl": null,
            "appPath": null,
            "externalId": null,
            "consoleUrl": "http://rkanter-MBP.local:11000/oozie?job=0000000-130522142644540-oozie-rkan-W",
            "conf": null,
            "parentId": null,
            "createdTime": "Wed, 22 May 2013 21:28:54 GMT",
            "toString": "Workflow id[0000000-130522142644540-oozie-rkan-W] status[SUCCEEDED]",
            "endTime": "Wed, 22 May 2013 21:28:54 GMT",
            "id": "0000000-130522142644540-oozie-rkan-W",
            "group": null,
            "user": "rkanter"
        }
    ]
}

 

As you can see, the output is returned as a JSON blob. We had one workflow named “no-op-wf” and there’s some other details about that job as well.

In the rest of the examples, we’ll be using the command-line program curl instead of Python. Using curl, we can specify the type of request with the -Xargument, though the default is a GET request. The previous query made with curl would look like this:

$ curl http://localhost:11000/oozie/v1/jobs?jobtype=wf
{"total":1,"workflows":[{"appPath":null,"acl":null,"status":"SUCCEEDED","createdTime":"Wed, 22 May 2013 21:28:54 GMT","conf":null,"lastModTime":"Wed, 22 May 2013 21:28:54 GMT","run":0,"endTime":"Wed, 22 May 2013 21:28:54 GMT","externalId":null,"appName":"no-op-wf","id":"0000000-130522142644540-oozie-rkan-W","startTime":"Wed, 22 May 2013 21:28:54 GMT","parentId":null,"toString":"Workflow id[0000000-130522142644540-oozie-rkan-W] status[SUCCEEDED]","group":null,"consoleUrl":"http:\/\/rkanter-MBP.local:11000\/oozie?job=0000000-130522142644540-oozie-rkan-W","user":"rkanter","actions":[]}],"len":50,"offset":1}

 

The output is the same but because curl doesn’t “pretty-print” it, the formatting is harder for humans to read. However, it should be fine for any consuming programs or scripts.

To make a PUT request using curl, we would do this:

$ curl -i -X PUT "http://localhost:11000/oozie/v1/job/0000000-130524111605784-oozie-rkan-W?action=kill"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Length: 0
Date: Fri, 24 May 2013 18:22:34 GMT

 

In this example, we told Oozie to kill the job with ID 0000000-130524111605784-oozie-rkan-W. Note that we specified -X PUT to do a PUT request instead of a GET request. We also specified the optional -i to make curl output the response headers to make it easier to see that the request succeeded. 

And to make a POST request using curl, we would do this:

$ cat config.xml

<configuration>
    <property>
        <name>user.name</name>
        <value>rkanter</value>
    </property>
    <property>
        <name>oozie.wf.application.path</name>
        <value>${nameNode}/user/${user.name}/${examplesRoot}/apps/no-op</value>
    </property>
    <property>
        <name>queueName</name>
        <value>default</value>
    </property>
    <property>
        <name>nameNode</name>
        <value>hdfs://localhost:8020</value>
    </property>
    <property>
        <name>jobTracker</name>
        <value>localhost:8021</value>
    </property>
    <property>
        <name>examplesRoot</name>
        <value>examples</value>
    </property>
</configuration>

$ curl -X POST -H "Content-Type: application/xml" -d @config.xml "http://localhost:11000/oozie/v1/jobs?action=start"
{"id":"0000009-130524111605784-oozie-rkan-W"}

 

In this example, we submitted and started the workflow located at ${nameNode}/user/${user.name}/${examplesRoot}/apps/no-op where the Oozie server will resolve ${nameNode}, ${user.name}, and ${examplesRoot} to hdfs://localhost:8020, rkanter, and examples respectively; these are defined in the above config.xml file as well. 

Note that this time we specified -X POST to make a POST request. The -d argument lets us pass the data, in this case the XML file describing the job we want to submit (equivalent to the job.properties we would use with the Oozie client to submit a job). 

We could have also specified the XML directly on the command line to the -d argument instead of using @filename, but with the file we can keep the pretty printed XML and also we don’t have to worry about escaping characters that the shell would otherwise try to interpret. We also had to specify -H “Content-Type: application/xml” to let the Oozie server know that we are sending XML instead of regular text. 

Using REST with Security

The REST API also works when Kerberos is enabled on our Oozie server. We simply have to make sure that we’ve acquired Kerberos credentials (e.g. kinit) before trying to connect to the server:

$ kinit -kt rkanter.keytab rkanter
$ curl --negotiate -u foo:bar http://localhost:11000/oozie/v1/admin/status
{"systemMode":"NORMAL"}

 

Depending on the tool we’re using to connect, additional arguments might need to be specified. For example, to use curl, as we can see above we have to specify the --negotiate and -u arguments. The username and password we specify with -u doesn’t matter because we’re using Kerberos, so we can put whatever we want (e.g. foo:bar, or even just :).  If we omit the -u then we’ll get a 401 Unauthorized error; even though its value is not actually being used. 

If our Oozie server is configured to use HTTPS (SSL), then we simply have to replace “http://” with “https://” and port “11000” with “11443”.  Some tools may automatically follow the redirect and we could continue using “http://” and “11000”; but some, like anything Java-based, will not. Note that if our Oozie server is using a self-signed certificate, some tools will also complain that the connection cannot be trusted or a similar message. Java doesn’t have an option to ignore this, so we’d have to provide the certificate, but some tools have this option. For example, with curl, specifying -kignores this:

$ curl -k https://localhost:11443/oozie/v1/admin/status
{"systemMode":"NORMAL"}

 

One Final Helpful Hint

Besides looking at the Oozie Web Services API documentation, it is also quite useful to enable debug mode on the Oozie client; doing so will cause the client to print out information about the exact REST request it’s making to the Oozie server and is a good way to experiment with how to use the API. 

To enable debug mode, enter the following in your terminal session:

export OOZIE_DEBUG=1

 

For example, once debug mode is enabled, the output will include information like this:

$ oozie job -config examples/apps/no-op/job.properties -run
POST http://localhost:11000/oozie/v1/jobs?action=start
<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><name>user.name</name><value>rkanter</value></property>
<property><name>oozie.wf.application.path</name><value>${nameNode}/user/${user.name}/${examplesRoot}/apps/no-op</value></property>
<property><name>queueName</name><value>default</value></property>
<property><name>nameNode</name><value>hdfs://localhost:8020</value></property>
<property><name>jobTracker</name><value>localhost:8021</value></property>
<property><name>examplesRoot</name><value>examples</value></property>
</configuration>
job: 0000000-130522142644540-oozie-rkan-W

 

Notice that the Oozie client made a POST request to http://localhost:11000/oozie/v1/jobs?action=start with the contents of job.properties as the above XML as the payload; and this looks very similar to the POST request we made with curl earlier (and without the nice formatting).

Conclusion

This blog post only scratches the surface of what can be done using Oozie’s REST API. Anything the Oozie client can do can be done with the REST API, which means that there’s a rich set of controls for interacting with Oozie that can be leveraged by any custom application, dashboard, script, or other workflow engine: We can turn just about anything into an Oozie client. A great example of this is Hue, which uses the REST API to view, manage, and submit jobs to Oozie. 

The full Web Services API documentation, which lists every command and has more examples, can be found here for CDH4: http://archive.cloudera.com/cdh4/cdh/4/oozie/WebServicesAPI.html. (Note: The documentation included with earlier versions of Oozie was missing some commands and was not always correct; this was addressed by OOZIE-1183 and included in CDH 4.3.)

Robert Kanter is a Software Engineer on the Platform team and an Apache Oozie Committer/PMC Member.

Filed under:

No Responses

Leave a comment


8 × six =