You may have noticed that this report went on hiatus for December 2014 due to a lack of critical news mass (plus, we realize that most of you are out of the loop until mid-January). It’s back with a vengeance, though:
- Cloudera and Google announced new work to bring Apache Spark support to Google Cloud Dataflow, via the Dataflow SDK. This new Spark “runner” is now available in the form of a Cloudera Labs project.
- Also released in Cloudera Labs: SparkOnHBase, an integration between Spark and Apache HBase.
- Spotify described how Apache Crunch is becoming its main tool for building data pipelines, and the value of its in-house Crunch libraries (crunch-lib).
- Apache NiFi, a dataflow management and automation system, entered the Apache Incubator.
- Transparent data encryption in HDFS became production-ready with the commitment of upstream patches related to Cloudera’s release of CDH 5.3.
- Netflix open-sourced its in-house UDFs for Hive and Pig, under the name Surus.
- Apache Hive 0.14, Apache Sqoop 1.99.4, Apache Tez 0.6, and Apache Pig 0.14 were all released by their respective communities.
- Call for Papers for HBaseCon 2015 (May 7 in San Francisco), the community conference for the Apache HBase community, opened. Cfp CLOSES at midnight on Feb. 6, so don’t wait.
- Apache Flink, Apache Drill, and Apache Falcon graduated into Top Level Projects.
That’s all for this month, folks!
Justin Kestelyn is Cloudera’s developer outreach director.