Cloudera Developer Blog · Bigtop Posts
As announced last Sunday (Aug. 25) on the project mailing list, Apache Hadoop 2.1.0 is the first beta release for Hadoop 2. (See the Release Notes for full list of new features and fixes.) Our congratulations to the Hadoop community for reaching this important milestone in the ongoing adoption of the core Hadoop platform!
With the release of this new beta, and the follow-on GA release on the horizon, we expect to see more customers exploring Hadoop 2 for production use cases. In fact, the upcoming CDH5 beta will be based on the Hadoop 2 GA release, delivering features that we’ve thoroughly tested against enterprise requirements, including (but not limited to):
Just in time for Hadoop Summit 2013, the Apache Bigtop team is very pleased to announce the release of Bigtop 0.6.0: The very first release of a fully integrated Big Data management distribution built on the currently most advanced Hadoop 2.x, Hadoop 2.0.5-alpha.
Bigtop, as many of you might already know, is a project aimed at creating a 100% open source and community-driven Big Data management distribution based on Apache Hadoop. (You can learn more about it by reading one of our previous blog posts on Apache Blogs.) Bigtop also plays an important role in CDH, which utilizes its packaging code from Bigtop — Cloudera takes pride in developing open source packaging code and contributing the same back to the community.
The very astute readers of this blog will notice that given our quarterly release schedule, Bigtop 0.6.0 should have been called Bigtop 0.7.0. It is true that we skipped a quarter. Our excuse is that we spent all this extra time helping the Hadoop community stabilize the Hadoop 2.x code line and making it a robust kernel for all the applications that are now part of the Bigtop distribution.
This installment of “Meet the Project Founder” features Apache Bigtop founder and PMC Chair/VP Roman Shaposhnik.
What led you to your project idea(s)?
Conceptually, Apache Bigtop can actually be traced as far back as me working at Sun Microsystems in 2007-2008. I was assisting the team responsible for coming up with a 100% community-driven, open source Solaris distribution that could also be used as a basis for an enterprise-grade commercial product offering (which eventually became OpenSolaris). I then joined Yahoo! Inc. as a manager of a small team of extremely talented engineers tasked with integration efforts around Yahoo’s internal cloud offering based on Hadoop. Our project was called HIT (Hadoop Integration Testing) and we were known as “HIT-men”.
In this installment, meet Cloudera Software Engineer/Apache Bigtop Committer Mark Grover (@mark_grover).
What do you do at Cloudera and in which Apache project are you involved?
I’m a Software Engineer at Cloudera, involved mostly with Apache Bigtop, an open source project aimed at building a community around packaging and interoperability testing of projects in the Apache Hadoop ecosystem. In addition, I contribute to Apache Hive, a data warehousing system built on top of Apache Hadoop that allows users to structure and query their Hadoop data using familiar SQL-like syntax. I have also written a section in O’Reilly’s book on Hive, Programming Hive.
The following post was originally published via apache.org. We bring it to you here in a slightly modified form.
We hope you all had a wonderful and restful holiday season and wish you all the very best for 2013! We are pleased to announce the release of Apache Bigtop 0.5.0!
Apache Bigtop, as many of you might already know, is a project for the development of packaging and tests for the Apache Hadoop ecosystem. You can learn more about it by reading one of our earlier blog posts on Apache Blogs.
Ever since Cloudera decided to contribute the code and resources for what would later become Apache Bigtop (incubating), we’ve been answering a very basic question: what exactly is Bigtop and why should you or anyone in the Apache (or Hadoop) community care? The earliest and the most succinct answer (the one used for the Apache Incubator proposal) simply stated that “Bigtop is a project for the development of packaging and tests of the Hadoop ecosystem”. That was a nice explanation of how Bigtop relates to the rest of the Apache Software Foundation’s (ASF) Hadoop ecosystem projects, yet it doesn’t really help you understand the aspirations of Bigtop.
Cloudera was the first company to create an open source distribution that included Apache Hadoop, releasing the first version (CDH1) back in March, 2009. The initial goal of CDH was to make Apache Hadoop easier to adopt, providing packaging to enable users to install Hadoop on popular Linux operating systems and not have to compile from source.
In mid-2010 Cloudera announced a major change in CDH that eventually came to recast what defined an Apache Hadoop based distribution. We observed that users were typically running not just Apache Hadoop but also a collection of other open source systems and components that were quickly becoming essential to have a fully functioning data management system. But in order to run such a system, organizations needed to do a great deal of work: assembling and integrating sometimes as many as a dozen different components. Each open source component had its own release schedule, dependencies, interfaces and standards for quality.