Meet the Project Founder: Tom White

Categories: Cloud Meet the Engineer


In this new installment of our “Meet the Project Founder” series, meet Tom White, founder of Apache Whirr, PMC Member for multiple other projects (Apache Hadoop, Apache Avro, Apache Bigtop, Apache Sqoop), and author of O’Reilly Media’s best-selling book, Hadoop: The Definitive Guide.

What led you to your project idea(s)?

Whirr grew out of some scripts I had written in 2006 for spinning up Hadoop clusters on Amazon EC2. At the time, I didn’t have access to a cluster to run Hadoop – even getting a handful of machines was a challenge – so when Amazon announced its EC2 service, I knew that it would be a great way to run Hadoop for individuals and organizations without plentiful hardware resources.

While it was pretty easy to start up a collection of machines on EC2, installing Hadoop on those machines was tricky (e.g. configuring hostnames was, and remains, an art), so the case for having code do the work was compelling. Whirr was a step beyond those early scripts in that it provides a Java API for provisioning clusters on any cloud (EC2, Rackspace, etc) – and not just Hadoop either: it comes with a simple service provider API for those who want to write new services. A recent example of this is the service for launching Cloudera Manager via Whirr.

Aside from doing the initial commit, what is your definition of the project founder’s role across the lifespan of the project? Benevolent dictator, referee, silent partner?

Early on in the project’s life the founder helps set the direction of the project. The thing I found hardest to get right is understanding what belongs in the project and what doesn’t. Not everyone on the project agrees about this, and since the work done on open source projects is done by volunteers, you can’t just tell everyone what the scope is. You have to make your case, and get agreement from the people who are going to be doing some of the work. If they don’t agree, then they’ll go to a different project!

The best way to ensure quality and quantity of contributions is to be very encouraging of new contributors.

There have been many cases in Hadoop itself where features that were considered to be “core” have, over time, moved to new projects outside the core. I see this as a natural result of maturity as the project community grows to understand what the project’s core mission is, and learns to focus on that.

Early on in a project, things are more fuzzy. For example, Hadoop used to have packaging code for RPMs and Debian packages – that function is now in a separate project (Bigtop). Eclipse integration has moved to Apache Hadoop Developer Tools, cloud scripts to Whirr, etc. All of these are positive developments since they allow a group that is interested in that particular area to focus on making that piece better.

The founder has to learn when to step back, too. At the beginning the founder will typically answer every question that comes to the user mailing list. However, at some point, you have to let others help out. I remember another project founder telling me how he remembers the first day he deliberately didn’t immediately answer a question, so he could let others jump in. Someone else did answer the question, and on that day, the project’s community grew a little bit — just through that small act.

What has surprised you the most about how your project has evolved/matured?

Whirr is a small project in the Hadoop ecosystem, but I’m pleasantly surprised whenever I read about a project that uses Whirr in some way. I’m also proud to work with folks who participate in related open source cloud projects; for example, several of the jclouds contributors and committers are a part of the Whirr community. It was nice to see jclouds enter the Apache Incubator earlier this year. And Andrei Savu, who has been one of the most prolific Whirr contributors, started a new Apache Incubator project, Provisionr, for reliable cluster provisioning in the cloud.

What is your philosophy, if you have one, for balancing quality versus quantity with respect to contributions?

I think the best way of ensuring both quality and quantity of contributions is to be very encouraging of new contributors. You need to make their experience of contributing a patch as pleasant as possible. This means responding to their contributions in a timely fashion, and providing good quality feedback on their patches. This practice helps build a virtuous circle: contributors who have successfully contributed one patch are likely to come back and contribute more, because it is enjoyable and a low-friction process. Over time they will become committers, who will review other people’s patches, and continue the cycle.

Do you have any other advice for potential project founders?

I would reiterate the importance of building community if you want to build a successful project. I think Henri Yandell (former ASF Board Member) captured it well: “Projects begin by thinking they’re in the software engineering business; after a while, they realize they’re in the recruiting business.”

Meet other project founders: