This Month in the Ecosystem (January 2015)

Categories: Community Hadoop

Welcome to our 16th edition of “This Month in the Ecosystem,” a digest of highlights from January 2015 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly). 

You may have noticed that this report went on hiatus for December 2014 due to a lack of critical news mass (plus, we realize that most of you are out of the loop until mid-January). It’s back with a vengeance, though:

  • Cloudera and Google announced new work to bring Apache Spark support to Google Cloud Dataflow, via the Dataflow SDK. This new Spark “runner” is now available in the form of a Cloudera Labs project.
  • Also released in Cloudera Labs: SparkOnHBase, an integration between Spark and Apache HBase.
  • Spotify described how Apache Crunch is becoming its main tool for building data pipelines, and the value of its in-house Crunch libraries (crunch-lib). 
  • Apache NiFi, a dataflow management and automation system, entered the Apache Incubator.
  • Transparent data encryption in HDFS became production-ready with the commitment of upstream patches related to Cloudera’s release of CDH 5.3.
  • Netflix open-sourced its in-house UDFs for Hive and Pig, under the name Surus.
  • Apache Hive 0.14, Apache Sqoop 1.99.4, Apache Tez 0.6, and Apache Pig 0.14 were all released by their respective communities.
  • Call for Papers for HBaseCon 2015 (May 7 in San Francisco), the community conference for the Apache HBase community, opened. Cfp CLOSES at midnight on Feb. 6, so don’t wait.
  • Apache Flink, Apache Drill, and Apache Falcon graduated into Top Level Projects.

That’s all for this month, folks!

Justin Kestelyn is Cloudera’s developer outreach director.