Welcome to our third edition of “This Month in the Ecosystem,” a digest of highlights from September 2013 (never intended to be comprehensive; for completeness, see Hadoop Weekly).
Note: there were a few other interesting developments this week, but out of respect for the calendar, I’ll address them next month.
- New Ecosystem Projects Joined the Apache Incubator
A trio of Hadoop-related projects were accepted into the Apache Incubator, including Storm (for real-time event processing – contributed by Nathan Marz and championed by Doug Cutting), Samza (for processing of data streamed from publish-subscribe systems – contributed by LinkedIn and championed by Jakob Homan), and Sentry (for role-based authorization and control in Hadoop – contributed by Cloudera and championed by Arvind Prabhakar). The first two newly incubating projects reflect the consensus for enriching the platform to include MapReduce alternatives, whereas the third reflects its evolution toward an enterprise-class security ideal.
Read the Storm Proposal | Read the Samza Proposal | Read the Sentry Proposal
- Python Became a First-Class Citizen in Apache Pig
Mortar Data, historically a center of Python + Hadoop evangelism, contributed code that brings CPython support to Apache Pig (obviating the inherent limitations of Jython UDFs or streaming) upstream to the Pig trunk. And Pythonistic data scientists rejoiced!
Learn more about CPython support in Pig
- DataFu Reached a Milestone
DataFu, the library of data mining/statistical analysis UDFs for Pig open-sourced by LinkedIn (and shipping inside Cloudera’s platform), became a 1.0 this month. In the blog post referenced below, LinkedIn engineer Matthew Vaughan goes into deep detail about current features.
Read about functionality in DataFu 1.0
- Big Data Made a Big Splash at Oracle OpenWorld 2013
Oracle OpenWorld turned out to be a bonanza for Big Data enthusiasts, with more sessions on Big Data use cases, and specifically on the architecture of Oracle’s Cloudera software-powered Big Data Appliance, than ever before (with some of the latter delivered by Cloudera engineers).
Review the Big Data agenda delivered at Oracle OpenWorld
- The Community Meetup Schedule for Strata + Hadoop World 2013 Solidified
More than 10 community meetups are planned for Strata + Hadoop World week — covering Cloudera Impala + Parquet, Apache Hive, Apache Sqoop, Apache HBase, Apache Flume, Apache Sentry and more — occuring onsite at the show as well as at offsite locations. It being early October, it’s time to RSVP and add at least one to your calendar.
See the meetup schedule at Strata + Hadoop World
The next installment of “This Month in the Ecosystem” will publish in early November.
Justin Kestelyn is Cloudera’s developer outreach director.