This Month in the Ecosystem (January 2014)

Categories: Hadoop

Welcome to our fifth edition of “This Month in the Ecosystem,” a digest of highlights from January 2014 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

  • Cloudera announced the general availability of Apache Spark, with a new parcel tested for use with CDH 4.4 and beyond. This makes Spark, which coincidentally graduated from ASF incubation on nearly the same day, the newest processing framework for enterprise data hubs (in this case, for workloads involving fast, advanced analytics).
  • Speaking of Spark, its original developers at UC Berkeley’s AMPLab have released a new front-end for running R queries on a Spark cluster (RSpark).
  • New benchmark testing revealed that a diverse set of (20) TPC-DS queries ran faster on Impala and Parquet than on a leading analytic DBMS and its own proprietary data store. These tantalizing results indicate that the common view of Apache Hadoop “trade-offs” (that is, the sacrifice of performance for flexibility) are no longer based on fact.
  • Foursquare made it known that it has open-sourced its Hadoop connector for MongoDB, on which it relies to analyze user-generated data captured operationally.
  • HBaseCon 2014 was announced and will occur on May 5, 2014, in San Francisco. Call for Papers and Early Bird registration both close on one week from today!
  • DataFu, the LinkedIn-developed library of Apache Pig UDFs, was accepted into the Apache Incubator. (DataFu is also distributed inside CDH.)
  • InfoWorld named Hadoop an awardee for “Technology of the Year” for 2014. Another kudo for the community!

That’s all for this month, folks!

Justin Kestelyn is Cloudera’s developer outreach director.