Cloudera Blog · Community Posts
This installment of “Meet the Project Founder” features Apache Bigtop founder and PMC Chair/VP Roman Shaposhnik.
What led you to your project idea(s)?
Conceptually, Apache Bigtop can actually be traced as far back as me working at Sun Microsystems in 2007-2008. I was assisting the team responsible for coming up with a 100% community-driven, open source Solaris distribution that could also be used as a basis for an enterprise-grade commercial product offering (which eventually became OpenSolaris). I then joined Yahoo! Inc. as a manager of a small team of extremely talented engineers tasked with integration efforts around Yahoo’s internal cloud offering based on Hadoop. Our project was called HIT (Hadoop Integration Testing) and we were known as “HIT-men”.
Contributing to Apache Hadoop or writing custom pluggable modules requires modifying Hadoop’s source code. While it is perfectly fine to use a text editor to modify Java source, modern IDEs simplify navigation and debugging of large Java projects like Hadoop significantly. Eclipse is a popular choice thanks to its broad user base and multitude of available plugins.
This post covers configuring Eclipse to modify Hadoop’s source. (Developing applications against CDH using Eclipse is covered in a different post.) Hadoop has changed a great deal since our previous post on configuring Eclipse for Hadoop development; here we’ll revisit configuring Eclipse for the latest “flavors” of Hadoop. Note that trunk and other release branches differ in their directory structure, feature set, and build tools they use. (The EclipseEnvironment Hadoop wiki page is a good starting point for development on trunk.)
This post covers the following main flavors:
The schedule/agenda grid for HBaseCon 2013 (rapidly approaching: June 13 in San Francisco) is a thing of beauty.
If you lacked motivation to register up until this point, we think that this session line-up will convince you otherwise. We repeat: whether you’re an HBase committer or just getting started (or at any level in between), HBaseCon is simply an event that you can’t afford to miss – and with an entry fee of just $350, it’s also one you can easily afford.
HBaseCon 2013 is approaching fast – June 13 in San Francisco. If you’re on the fence about attending – or perhaps your manager is on the fence about approving your participation – here are a few things that you/they need to know (in no particular order):
- HBaseCon is the annual rallying point for the HBase community. If you’ve ever had a desire to learn how to get involved in the community as a contributor, or just want to ask a committer or PMC member why things are done (or not done) a certain way, this is your opportunity – because this is where those people are. Participating in a mailing list thread is never quite the same once you’ve met the people behind it.
- HBaseCon is a one-stop shop for learning about the HBase roadmap, as well as other projects across the ecosystem. Current HBase users should be particularly interested in learning about which JIRAs will have the most impact on the user experience – and once again, most of the committers working on those JIRAs will either be leading sessions or otherwise present. Plus, you can learn about how new complementary projects like Impala, Kiji, Phoenix, and Honeycomb are transforming the use cases for HBase and helping to expand its footprint across the enterprise.
- HBaseCon is a feast of real-world experiences and use cases. Sure, maybe you’ve read about the HBase-backed applications used by companies like Facebook, Salesforce.com, eBay, Pinterest, and Yahoo!. But wouldn’t it be helpful to hear technical details and best practices directly from the people who built and run them? I’ll bet it would. And you really can’t do that anywhere else — in the whole world. (Plus, you can take advantage of formal training right before the conference, at a discount.)
- HBaseCon is a pageant of engineer rock-stars. If your company is an HBase user and hungry for talent, there’s no better place to find it: HBaseCon is literally the world’s biggest gathering of HBase experts under one roof.
- HBaseCon is a heck of a blast. Come for the deep-dives and advice, stay for the after-event party. The libations will be extensive!
If you have any interest in HBase whatsoever, whether as a user or prospective user, missing HBaseCon is almost unthinkable.
At Cloudera, there is a long and proud tradition of employees creating new open source projects intended to help fill gaps in platform functionality (in addition to hiring new employees who have done so in the past). In fact, more than a dozen ecosystem projects — including Apache Hadoop itself — were founded by Clouderans, more than can be attributed to employees of any other single company. Cloudera was also the first vendor to ship most of those projects as enterprise-ready bits inside its platform.
We thought you might be interested in meeting some of them over the next few months, in a new “Meet the Project Founder” series. It’s only appropriate that we begin with Doug Cutting himself – Cloudera’s chief architect and the quadruple-threat founder of Apache Lucene, Apache Nutch, Apache Hadoop, and Apache Avro.
What led you to your project idea(s)?
HBaseCon (hosted by Cloudera), now in its second year, is THE community event for Apache HBase contributors, developers, admins, and users. There is no better place to dive head-first into use cases, best practices, internals, and futures as well as to meet the rest of the community.
Today Cloudera announced a new Cloudera Academic Partnership program, in which participating universities worldwide get access to curriculum, training, certification, and software.
As noted in the press release, the global demand for people with Apache Hadoop and data science skills is dwarfing all supply. We consider it an important mission to help accredited universities meet that demand, by equipping them with the content and training they need to educate students in the Hadoop arts.
Furthermore, we are cognizant of the fact that many academic research labs are in need of tools to help deploy, manage, and extend Hadoop clusters. For that reason, CAP members get free access to Cloudera Manager Enterprise Edition for 12 months to support data-intensive testing, development, and research.
It’s only Rock and Roll, but I like it!
– Mick Jagger
Copyright is having a tough time in the digital age. New copies of music, movies and software can be created at near zero cost. Some wonder whether it still makes sense to ever charge for content. Over the past century large industries have developed that sell content. These industries resist change. We consumers love our content, but don’t love paying for it. But would all the content we love still exist without payment for copyright?
One solution might be to replace sales of copyrighted material with services that provide access to the content. We could buy tickets to concerts, a service provided by musicians. We could enjoy streaming music services via subscriptions or supported by advertising. Similarly, we could access software as a service in the cloud. Some companies, like Google, create proprietary software yet don’t make money off it by selling its copyright, but only through services. That’s a great model, but is it the only way forward?
It’s time for me to give you a quarterly update (here’s the one for Q1) about where to find tech talks by Cloudera employees in 2013. Committers, contributors, and other engineers will travel to meetups and conferences near and far to do their part in the community to make Apache Hadoop a household word!
(Remember, we’re always ready to assist your meetup by providing speakers, sponsorships, and schwag.)
A couple highlights:
Cloudera will be a proud exhibitor at O’Reilly OSCON 2013 (July 22-26 in Portland, OR), which in our opinion is a shining light in the open source community. So be sure to look for us!
We also want to take this opportunity to congratulate all speakers who will be presenting at OSCON. Furthermore, we want to highlight the talks led by Clouderans for your personal schedule: