Author Archives: Robert Kanter

How-to: Use the ShareLib in Apache Oozie (CDH 5)

Categories: How-to Oozie

The internals of Oozie’s ShareLib have changed recently (reflected in CDH 5.0.0). Here’s what you need to know.

In a previous blog post about one year ago, I explained how to use the Apache Oozie ShareLib in CDH 4. Since that time, things have changed about the ShareLib in CDH 5 (particularly directory structure), so some of the previous information is now obsolete. (These changes went upstream under OOZIE-1619.) 

In this post I’ll explain those changes.

Read More

How-to: Use cron-like Scheduling in Apache Oozie

Categories: How-to Oozie

Improved scheduling capabilities via Oozie in CDH 5 makes for far fewer headaches.

One of the best new Apache Oozie features in CDH 5, Cloudera’s software distribution, is the ability to use cron-like syntax for coordinator frequencies. Previously, the frequencies had to be at fixed intervals (every hour or every two days, for example) – making scheduling anything more complicated (such as every hour from 9am to 5pm on weekdays or the second-to-last day of every month) complex and difficult. 

Read More

Inside Apache Oozie HA

Categories: Oozie Ops and DevOps

Oozie’s new HA qualities help cluster operators sleep well at night. Here’s how it works.

One of the big new features in CDH 5 for Apache Oozie is High Availability (HA). In designing this feature, the Oozie team at Cloudera had two main goals: 1) Don’t change the API or usage patterns, and 2) the user shouldn’t even have to know that HA is enabled. In other words, we wanted Oozie HA to be as easy and transparent as possible. 

Read More

How-to: Shorten Your Oozie Workflow Definitions

Categories: How-to Oozie

While XML is very good for standardizing the way Apache Oozie workflows are written, it’s also known for being very verbose. Unfortunately, that means that for workflows that have many actions, your workflow.xml can easily become quite long and difficult to manage and read. Cloudera is constantly making improvements to address this issue, and in this how-to, you’ll get a quick look at some of the current features and tricks that you can use to help shorten your Oozie workflow definitions.

Read More

How-to: Write an EL Function in Apache Oozie

Categories: How-to Oozie

When building complex workflows in Apache Oozie, it is often useful to parameterize them so they can be reused or driven from a script, and more easily maintained. The most common method is via ${VAR} variables. For example, instead of specifying the same NameNode for all of your actions in a given workflow, you can specify something like ${myNameNode}, and then in your job.properties file, you would define it like myNameNode=hdfs://localhost:8020.

One of the advantages of that approach is that if you want to change the variable (the NameNode in this example),

Read More