Cloudera Blog · Hadoop Posts
At Cloudera, there is a long and proud tradition of employees creating new open source projects intended to help fill gaps in platform functionality (in addition to hiring new employees who have done so in the past). In fact, more than a dozen ecosystem projects — including Apache Hadoop itself — were founded by Clouderans, more than can be attributed to employees of any other single company. Cloudera was also the first vendor to ship most of those projects as enterprise-ready bits inside its platform.
We thought you might be interested in meeting some of them over the next few months, in a new “Meet the Project Founder” series. It’s only appropriate that we begin with Doug Cutting himself – Cloudera’s chief architect and the quadruple-threat founder of Apache Lucene, Apache Nutch, Apache Hadoop, and Apache Avro.
What led you to your project idea(s)?
As Cloudera’s keeper of customer stories, it’s dawned on me that others might benefit from the information I’ve spent the past year collecting: the many use cases and deployment patterns for Hadoop amongst our customer base.
This week I’d like to highlight Nokia, a global company that we’re all familiar with as a large mobile phone provider, and whose Senior Director of Analytics – Amy O’Connor – will be speaking at tomorrow’s Cloudera Sessions event in Boston.
Fun fact: Nokia has been in business for more than 150 years, starting with the production of paper in the 1800s. When I first met Amy O’Connor in early 2012, she explained to me that Nokia has always been in the business of transforming resources into useful products — from paper and rubber over a century ago, to the electronics and mobile devices we’re familiar with today.
Today Cloudera announced a new Cloudera Academic Partnership program, in which participating universities worldwide get access to curriculum, training, certification, and software.
As noted in the press release, the global demand for people with Apache Hadoop and data science skills is dwarfing all supply. We consider it an important mission to help accredited universities meet that demand, by equipping them with the content and training they need to educate students in the Hadoop arts.
Furthermore, we are cognizant of the fact that many academic research labs are in need of tools to help deploy, manage, and extend Hadoop clusters. For that reason, CAP members get free access to Cloudera Manager Enterprise Edition for 12 months to support data-intensive testing, development, and research.
It’s always a great thing for everybody when the experts are willing and eager to share.
So, it’s with special pleasure that I can point you toward a new three-part series by Cloudera’s own Tom White (@tom_e_white) to be published in Dr Dobb’s, which has long been one of the publications of record in the mainstream developer world – from which many original programmers learned basics like BASIC. Now, Dobb’s turns its attention to Apache Hadoop, which says a lot about Hadoop’s continuing adoption.
Tom, of course, is the author of the O’Reilly best-seller Hadoop: The Definitive Guide, and few people have a better record of being both knowledgeable and helpful for those who want to learn “how to Hadoop”.
It’s time for me to give you a quarterly update (here’s the one for Q1) about where to find tech talks by Cloudera employees in 2013. Committers, contributors, and other engineers will travel to meetups and conferences near and far to do their part in the community to make Apache Hadoop a household word!
(Remember, we’re always ready to assist your meetup by providing speakers, sponsorships, and schwag.)
A couple highlights:
Cloudera will be a proud exhibitor at O’Reilly OSCON 2013 (July 22-26 in Portland, OR), which in our opinion is a shining light in the open source community. So be sure to look for us!
We also want to take this opportunity to congratulate all speakers who will be presenting at OSCON. Furthermore, we want to highlight the talks led by Clouderans for your personal schedule:
On this special April 1 – the seven-year anniversary of the Apache Hadoop project’s first release – Hadoop founder Doug Cutting (also Cloudera’s chief architect and the Apache Software Foundation chair) offers seven thoughts on Hadoop:
- Open source accelerates adoption.
If Hadoop had been created as proprietary software it would not have spread as rapidly. We’ve seen incredible growth in the use of Hadoop. Partly that’s because it’s useful. But many would have been cautious to make a vendor-controlled platform part of their infrastructure, useful or not.
- Apache builds collaborative communities.
The Hadoop ecosystem has hundreds of developers working for tens of organizations. Competitors productively collaborate on a daily basis, improving the software we all share. The Apache Software Foundation gives us the methodology that enables this. (Thanks, Apache!)
- The timing is right.
In this Charlie Rose interview that aired on March 22, 2013, Cloudera’s Chief Scientist Jeff Hammerbacher (@hackingdata) offers fascinating insights into the origins of Big Data and data science techniques at Google and their re-implementation into open source used by consumer Web companies. Furthermore, he offers great detail about their positive application across healthcare diagnostics and delivery – as well as the overall need for better balance between “numerical imagination” and “narrative imagination” in everything we do (in order to “ask bigger questions”, as some would say).
It’s an incredibly valuable look into where Big Data came from, where it’s going, and how Cloudera is helping it get there.
Hadoop Summit Europe is coming up in Amsterdam next week, so this is an appropriate time to make you aware of the Cloudera speaker program there (all three talks on Thursday, March 21):
Below you’ll find the official announcement from Cloudera and Twitter about Parquet, an efficient general-purpose columnar file format for Apache Hadoop.
Parquet is designed to bring efficient columnar storage to Hadoop. Compared to, and learning from, the initial work done toward this goal in Trevni, Parquet includes the following enhancements: