Cloudera Blog · Bigtop Posts
This installment of “Meet the Project Founder” features Apache Bigtop founder and PMC Chair/VP Roman Shaposhnik.
What led you to your project idea(s)?
Conceptually, Apache Bigtop can actually be traced as far back as me working at Sun Microsystems in 2007-2008. I was assisting the team responsible for coming up with a 100% community-driven, open source Solaris distribution that could also be used as a basis for an enterprise-grade commercial product offering (which eventually became OpenSolaris). I then joined Yahoo! Inc. as a manager of a small team of extremely talented engineers tasked with integration efforts around Yahoo’s internal cloud offering based on Hadoop. Our project was called HIT (Hadoop Integration Testing) and we were known as “HIT-men”.
In this installment, meet Cloudera Software Engineer Mark Grover (@mark_grover).
What do you do at Cloudera and in which Apache project are you involved?
I’m a Software Engineer at Cloudera, involved mostly with Apache Bigtop, an open source project aimed at building a community around packaging and interoperability testing of projects in the Apache Hadoop ecosystem. In addition, I contribute to Apache Hive, a data warehousing system built on top of Apache Hadoop that allows users to structure and query their Hadoop data using familiar SQL-like syntax. I have also written a section in O’Reilly’s book on Hive, Programming Hive.
The following post was originally published via apache.org. We bring it to you here in a slightly modified form.
We hope you all had a wonderful and restful holiday season and wish you all the very best for 2013! We are pleased to announce the release of Apache Bigtop 0.5.0!
Apache Bigtop, as many of you might already know, is a project for the development of packaging and tests for the Apache Hadoop ecosystem. You can learn more about it by reading one of our earlier blog posts on Apache Blogs.
Ever since Cloudera decided to contribute the code and resources for what would later become Apache Bigtop (incubating), we’ve been answering a very basic question: what exactly is Bigtop and why should you or anyone in the Apache (or Hadoop) community care? The earliest and the most succinct answer (the one used for the Apache Incubator proposal) simply stated that “Bigtop is a project for the development of packaging and tests of the Hadoop ecosystem”. That was a nice explanation of how Bigtop relates to the rest of the Apache Software Foundation’s (ASF) Hadoop ecosystem projects, yet it doesn’t really help you understand the aspirations of Bigtop.
Cloudera was the first company to create an open source distribution that included Apache Hadoop, releasing the first version (CDH1) back in March, 2009. The initial goal of CDH was to make Apache Hadoop easier to adopt, providing packaging to enable users to install Hadoop on popular Linux operating systems and not have to compile from source.
In mid-2010 Cloudera announced a major change in CDH that eventually came to recast what defined an Apache Hadoop based distribution. We observed that users were typically running not just Apache Hadoop but also a collection of other open source systems and components that were quickly becoming essential to have a fully functioning data management system. But in order to run such a system, organizations needed to do a great deal of work: assembling and integrating sometimes as many as a dozen different components. Each open source component had its own release schedule, dependencies, interfaces and standards for quality.