Let a Thousand Hadoop How-Tos Bloom

Categories: Community Hadoop

History teaches us that ecosystem growth is fueled by enthusiasm, tools (including frameworks and APIs), and knowledge in roughly equal measures. To this point, the Apache Hadoop ecosystem has been blessed with the first two ingredients – thanks to the magic of open source – but in the third category, there is still plenty of work to be done.

For Cloudera, our Academic Partnership program is a major part of that effort. Through that program, accredited nonprofit universities around the world get access to Cloudera’s own Hadoop curriculum for their computer science departments, in addition to discounted training and certification for students and instructors. Thus far, Cloudera Academic Partners include (but are not limited to) San Jose State University, DePaul University, Fordham University, Vanderbilt University, Technische Universität Berlin, and Rensselaer Polytechnic Institute.

Recently, a wonderful side effect has emerged: the contribution of knowledge and experience deriving from an Academic Partnership back to the ecosystem!

The University of St. Thomas (UST) in St. Paul, Minnesota, is such a partner through its Center of Excellence for Big Data (CoE4BD). UST’s software engineering faculty are no strangers to Hadoop; they have been doing research, teaching courses, and interacting with local companies using Hadoop ecosystem technologies for several years. That effort has been throttled by a lack of complete documentation in the open ecosystem, however. According to Brad Rubin, an associate professor in UST’s Graduate Programs in Software, “While we have found excellent references available on many topics, we have also found some gaps in short, focused how-to docs for common questions and use cases.”

Just recently, the CoE4BD staff and students started to contribute what they have learned back to the community in the form of Hadoop how-tos in a GitHub repository. Furthermore, they are releasing longer-form information derived from faculty research and student projects in a sister repo. Consumers of this content may ask questions and seek advice through Cloudera’s community forum for questions about Hadoop concepts. From time to time, you will also see Cloudera re-publish CoE4BD-produced how-tos in this very blog. (See our most popular how-tos to date here.)

Rubin says that UST’s status as a Cloudera Academic Partner has directly led to topic ideas (which will grow over time) and that technical support from Cloudera has helped overcome stumbling blocks. We can think of no better use case for an Academic Partnership!

For more information about the CoE4BD, point your browser to www.stthomas.edu/coe4bd or send them an email.

Justin Kestelyn is Cloudera’s developer community outreach director.