Cloudera Developer Blog · Community Posts
OSCON 2013 is already receding in the rear-view mirror, but we had a great time. Cloudera’s sessions were very well attended — with Tom Wheeler taking the prize (well over 200 attendees for his “Introduction to Apache Hadoop” tutorial) — but best of all was the opportunity to meet and mingle with people in the broader open source community. If you visited us at Booth 420, we hope you will now download and install the QuickStart VM after seeing it in our demo, and that your questions were adequately answered (most popular question: “Can you tell me more about Cloudera Impala?”)
In my biased opinion, the crowning achievement was our ability to not only distribute a couple hundred “Data is the New Bacon” Tshirts within a 36-hour period, but to clean ourselves out of the meat-free version shortly thereafter, as well:
This is a great day for technical end-users – developers, admins, analysts, and data scientists alike. Starting now, Cloudera complements its traditional mailing lists with a new, feature-rich community forums intended for users of Cloudera’s Platform for Big Data! (Login using your existing credentials or click the link to register.)
Although mailing lists have long been a standard for user interaction, and will undoubtedly continue to be, they have flaws. For example, they lack structure or taxonomy, which makes consumption difficult. Search functionality is often less than stellar and users are unable to build reputations that span an appreciable period of time. For these reasons, although they’re easy to create and manage, mailing lists inherently limit access to knowledge and hence limit adoption.
The new service brings key additions to the conversation: functionality, search, structure and scalability. It is now considerably easier to ask questions, find answers (or questions to answer), follow and share threads, and create a visible and sustainable reputation in the community. And for Cloudera customers, there’s a bonus: your questions will be escalated as bonafide support cases under certain circumstances (see below).
Continuing the fine tradition of Clouderans contributing books to the Apache Hadoop ecosystem, Apache Sqoop Committers/PMC Members Kathleen Ting and Jarek Jarcec Cecho have officially joined the book author community: their Apache Sqoop Cookbook is now available from O’Reilly Media (with a pelican the assigned cover beast).
The book arrives at an ideal time. Hadoop has quickly become the standard for processing and analyzing Big Data, and in order to integrate a new Hadoop deployment into your existing environment, you will very likely need to transfer data stored in legacy relational databases into your new cluster.
Sqoop is just the ticket; it optimizes data transfers between Hadoop and RDBMSs via a command-line interface listing 60 parameters. This new cookbook focuses on applying these parameters to common use cases — one recipe at a time, Kate and Jarek guide you from basic commands that don’t require prior Sqoop knowledge all the way to very advanced use cases. These recipes are sufficiently detailed not only to enable you to deploy Sqoop in your environment, but also to understand its inner workings.
Below please find our regularly scheduled quarterly update about where to find tech talks by Cloudera employees this year – this time, for July through September 2013. Note that this list will be continually curated during the period; complete logistical information may not be available yet.
As always, we’re standing by to assist your meetup by providing speakers, sponsorships, and schwag!
|July 11||Boston||Boston HUG||Solr Committer Mark Miller on Solr+Hadoop|
|July 11||Santa Clara, Calif.||Big Data Gurus||Patrick Hunt on Solr+Hadoop|
|July 11||Palo Alto, Calif.||Cloudera Manager Meetup||Phil Zeyliger on Cloudera Manager internals|
|July 11||Kansas City, Mo.||KC Big Data||Matt Harris on Impala|
|July 17||Mountain View, Calif.||Bay Area Hadoop Meetups||Patrick Hunt on Solr+Hadoop|
|July 22||Chicago||Chicago Big Data||Hadoop and Lucene founder Doug Cutting on Solr+Hadoop|
|July 22||Portland, Ore.||OSCON 2013||Tom Wheeler on “Introduction to Apache Hadoop”|
|July 24||Portland, Ore.||OSCON 2013||Sqoop Committer Kate Ting on “Building an Impenetrable ZooKeeper”|
|July 24||Portland, Ore.||OSCON 2013||Jesse Anderson on “Doing Data Science On NFL Play by Play”|
|July 24||Portland, Ore.||OSCON 2013||Bigtop Committer Mark Grover on “Getting Hadoop, Hive and HBase up and running in less than 15 minutes”|
|July 24||Portland, Ore.||OSCON 2013||Hadoop Committer Colin McCabe on Locksmith|
|July 25||San Francisco||SF Data Engineering||Wolfgang Hoschek on Morphlines|
|July 25||Washington DC||Hadoop-DC||Joey Echeverria on Accumulo|
|Aug. 14||San Francisco||SF Hadoop Users||TBD, but we’re hosting!|
|Aug. 14||LA||LA HBase Users Meetup||HBase Committer/PMC Chair Michael Stack on HBase|
|Aug. 29||London||London Java Community||Hadoop Committer Tom White on CDK|
|Sept. 11||San Francisco||Cloudera Sessions (SOLD OUT)||Eric Sammer-led CDK lab|
|Sept. 12||New York||NYC Search, Discovery & Analytics Meetup||Solr Committer Mark Miller on Solr+Hadoop|
|Sept. 12||Cambridge, UK||Enterprise Search Cambridge UK||Tom White on Solr+Hadoop|
|Sept. 12||Los Angeles||LA Hadoop Users Group||Greg Chanan on Solr+Hadoop|
|Sept. 16||Sunnyvale, Calif.||Big Data Gurus||Eric Sammer on CDK|
|Sept. 17||Sunnyvale, Calif.||SF Large-Scale Production Engineering||Darren Lo on Hadoop Ops|
|Sept. 18||Mountain View, Calif.||Silicon Valley JUG||Wolfgang Hoschek on Morphlines|
|Sept. 19||El Dorado Hills, Calif.||NorCal Big Data||Apache Bigtop Committer Sean Mackrory on Bigtop & QuickStart VM|
|Sept. 24||Washington DC||Hadoop-DC||Doug Cutting on Apache Lucene|
In this installment of “Meet the Project Founder”, meet Apache Oozie PMC member (and ASF member) Alejandro Abdelnur, the Cloudera software engineer who founded what eventually became the Apache Oozie project in 2011. Alejandro is also on the PMC of Apache Hadoop.
What led you to your project idea(s)?
Back in 2008, while I was working at Yahoo! in Bangalore, we began to notice that other teams were taking a variety of manual, ad hoc approaches (whether using shell scripts, JobControl, Ant, and so on) to managing multiple Hadoop jobs. There was clearly an opportunity to build a single solution that everyone could use and that could be much more efficiently supported internally.
Hadoop Summit convenes next week, and even if you’re not attending, there are a host of meetup opportunities available to you during the week.
Here are just a few, and you can find a full list here.
Tues. June 25
For those of you who missed the show, session video and presentation slides (as well as photos) will be available via hbasecon.com in a few weeks. (To be notified, follow @cloudera or @ClouderaEng.) Although it’s not quite as good as being there with the rest of the community, you’ll still be able to partake from the real-world experiences of Apache HBase users like Facebook, Box, Yahoo!, Salesforce.com, Pinterest, Twitter, Groupon, and more.
While you’re waiting for that, allow me to bring you just this single photo to capture the HBaseCon experience:
HBaseCon 2013 is this Thursday (June 13 in San Francisco), and we can hardly wait!
To complete the “preview” cycle, today we bring you a snapshot of the Case Studies track, which offers a cross-section of the many real-world use cases for Apache HBase. You will learn about how a range of companies across diverse industries use it at the heart of their IT infrastructure to run their business.
What do you do at Cloudera (and in which Apache project(s) are you involved)?
I’m a software engineer on the Search team. I’ve been involved in the Apache Lucene community since 2006 and Apache Solr since around 2009. I spend a lot of time adding features to Solr and fixing bugs, as well as working on improving Solr integration with the rest of the Hadoop ecosystem. I kind of think of myself as a “distributed search guy” at the moment.
Unbelievably, HBaseCon 2013 is only one week away (June 13 in San Francisco)!
Today we bring you a preview of the Ecosystems track, a grand tour (in 20-minute increments) of the fascinating current work being done across the community to extend or build on top of Apache HBase.