Cloudera Developer Blog · General Posts
Strata Conference + Hadoop World 2013 (Oct. 28-30 in New York City) approaches (register here for an automatic 20% discount), and that means it’s time to get your meetup schedule sorted out!
There are a variety of them planned across the week (something for everyone!), onsite at the conference hotel as well as offsite. Use the links below to RSVP.
(YES, there will be food, adult refreshments, and T-shirts.)
Welcome to our second edition of “This Month in the Ecosystem.” (See the inaugural edition here.) Although August was not as busy as July, there are some very notable highlights to report.
As announced last Sunday (Aug. 25) on the project mailing list, Apache Hadoop 2.1.0 is the first beta release for Hadoop 2. (See the Release Notes for full list of new features and fixes.) Our congratulations to the Hadoop community for reaching this important milestone in the ongoing adoption of the core Hadoop platform!
With the release of this new beta, and the follow-on GA release on the horizon, we expect to see more customers exploring Hadoop 2 for production use cases. In fact, the upcoming CDH5 beta will be based on the Hadoop 2 GA release, delivering features that we’ve thoroughly tested against enterprise requirements, including (but not limited to):
Catherine Ray, a Summer Intern at Cloudera this year, was kind enough to summarize her experiences for you below. Best of luck in your new field, Catherine!
I’m currently 16 and a rising senior at George Mason University, majoring in Computational Physics. (The full title is Computational and Data Sciences with a concentration in Physics.).
I had a wonderful time working on my project. In short, I worked on an Apache Hadoop-based downloads tracking system. In this system, raw downloads logs are ingested via Apache Flume into HDFS, then parsed with an MR Job into a Cloudera Impala-friendly format. I had the opportunity to collaborate with one of our teams in New York to pull the whole system together. To fully utilize the data contained in the logs, I created a Java library that finds the organizational information associated with a given IP address. I also helped to create dashboards that use queries against the collected data to analyze it and produce sales leads.
The ecosystem is evolving at a rapid pace – so rapidly, that important developments are often passing through the public attention zone too quickly. Thus, we think it might be helpful to bring you a digest (by no means complete!) of our favorite highlights on a regular basis. (This effort, by the way, has different goals than the fine Hadoop Weekly newsletter, which has a more expansive view – and which you should subscribe to immediately, as far as we’re concerned.)
Find the first installment below. Although the time period reflected here is obviously more than a month long, we have some catching up to do before we can move to a truly monthly cadence.
The following guest post, from Mike Pittaro of Dell’s Cloud Software Solutions team, describes his team’s use of the Dell Crowbar tool in conjunction with the Cloudera Manager API to automate cluster provisioning. Thanks, Mike!
Deploying, managing, and operating Apache Hadoop clusters can be complex at all levels of the stack, from the hardware on up. To hide this complexity and reduce deployment time, since 2011, Dell has been using Dell Crowbar in conjunction with Cloudera Manager to deploy the Dell | Cloudera Solution for Apache Hadoop for joint customers.
Cloudera Manager does a great job of deploying and managing the Hadoop layers of a cluster, but it depends on an operating system to be in place first. Meanwhile, to complement those capabilities, Dell Crowbar is a complete automated operations platform, designed to deploy layers of infrastructure on bare-metal servers and all the way up the stack.
Cloudera Impala has made huge progress since its initial announcement – and there’s even more good news on the roadmap. To learn more, plan to attend an Impala meetup hosted by Cloudera in its San Francisco offices on the evening of Aug. 20:
Five years ago today, on June 27, 2008, we filed the incorporation paperwork for Cloudera, Inc., a new company we created to bring the power of Google’s big data platform to the masses.
Back then, nobody was talking about “big data” and the only people who knew about Apache Hadoop were wild-eyed engineers working in the consumer internet. Today, the software is right at the center of a major new market in technology. It’s used by hospitals, energy companies, retailers, banks and others.
The past five years have been tremendous. My thanks and congratulations to Clouderans around the world, to our fantastic users, customers and partners, and to the open source community generally for all you’ve done to bring us this far. We’re off to a pretty good start!
In this installment of “Meet the Project Founder”, meet Apache Oozie PMC member (and ASF member) Alejandro Abdelnur, the Cloudera software engineer who founded what eventually became the Apache Oozie project in 2011. Alejandro is also on the PMC of Apache Hadoop.
What led you to your project idea(s)?
Back in 2008, while I was working at Yahoo! in Bangalore, we began to notice that other teams were taking a variety of manual, ad hoc approaches (whether using shell scripts, JobControl, Ant, and so on) to managing multiple Hadoop jobs. There was clearly an opportunity to build a single solution that everyone could use and that could be much more efficiently supported internally.
We announced a leadership change at Cloudera today. Tom Reilly, formerly CEO at Arcsight, is joining us in my old role – CEO – and I am assuming two new posts: Chief Strategy Officer and Chairman of the Board of Directors.
When we started the company five years ago, almost no one had heard of Apache Hadoop. Big Data, to the extent the term was used at all, was strictly a consumer internet phenomenon. No other enterprise vendor believed the platform mattered.
We did, of course, and we set out to make that true. We’ve engaged closely with the open source community, worked hard to advance the state of the art in the platform and crafted a business strategy that allows us to grow quickly and to build a great company for the long term.