In this installment of “Meet the Project Founder”, meet Apache Oozie PMC member (and ASF member) Alejandro Abdelnur, the Cloudera software engineer who founded what eventually became the Apache Oozie project in 2011. Alejandro is also on the PMC of Apache Hadoop.
What led you to your project idea(s)?
Back in 2008, while I was working at Yahoo! in Bangalore, we began to notice that other teams were taking a variety of manual, ad hoc approaches (whether using shell scripts, JobControl, Ant, and so on) to managing multiple Hadoop jobs. There was clearly an opportunity to build a single solution that everyone could use and that could be much more efficiently supported internally.
My team built one of those ad hoc systems to process the ingestion of partner feeds. This system was called Pac-Man, and it abstracted the multiple steps required to build data pipelines via a server-based workflow engine. Leveraging what we learned while developing Pac-Man, and after talking with other teams, we designed and built Oozie, a general-purpose workflow system for Hadoop.
After an extensive internal evaluation, Oozie became the default solution for workflow coordination internally. In May 2010, Yahoo! open-sourced Oozie’s source code. Some time later, I left Yahoo! and came to Cloudera. In 2011 Oozie became an Apache Incubator project, graduating into top-level project in 2012.
Oozie has an almost perfect record on backward-compatibility, which is almost unheard of.
Aside from doing the initial commit, what is your definition of the project founder’s role across the lifespan of the project — benevolent dictator, referee, silent partner?
I see the project founder primarily as a “first shipper” – just another developer on the project. Secondarily, the founder is the caretaker of the original project vision and keeps everyone pointed in the right direction. You have to be careful though because there is always some emotional attachment there, and your instinct could be to be overly protective. Instead, you need to be open to things that add value to the project, even if you don’t agree with them 100 percent. That’s the whole point of being a community.
What has surprised you the most about how your projects have evolved/matured?
The stability, scalability, and reliability achieved by the community in such a short time have all surprised me. But the best thing of all has been backward compatibility – the Oozie developers have done an awesome job there. We have an almost perfect record on that point, which is not something you see from most projects.
I think the flexibility of the implementation has also surprised a lot of people, even us. It lets us quickly integrate new technologies. Adding support for YARN and HCatalog in Oozie was a relatively simple task, for example.
What is the major work yet to be done, from your perspective as a project founder?
The developer community needs to get bigger and more diverse. Oozie also needs to be easier to use, and support more use cases. For example, Oozie does a good job of supporting synchronous processing (of data that arrives on a regular schedule and in expected amounts). It needs to get better at asynchronous processing, of data as it becomes available (or on demand, basically). We’re working on that.
What is your philosophy, if you have one, for balancing quality versus quantity with respect to contributions?
Quality always comes first. We always focus more on making sure that things work right, rather than just dumping half-baked features into it and destabilizing the project. My philosophy is to always take baby steps – take a lot of them, and make sure they’re tiny ones.
Do you have any other advice for potential project founders?
Stay patient and be open to new ideas. Like I said previously, you may not agree with them completely, but sometimes you have to sacrifice your own reservations for the good of the project and the community.