This Month in the Ecosystem (June 2014)

Categories: Community Hadoop

Welcome to our 10th edition of “This Month in the Ecosystem,” a digest of highlights from June 2014 (never intended to be comprehensive; for that, see the excellent Hadoop Weekly).

Pretty busy for early Summer:

  • A design document has gone upstream describing new work to make Apache Hive run on Apache Spark for a data processing backend. This effort is the first step in a collaborative broader effort by Cloudera, Databricks, IBM, Intel, and MapR to make Spark the data processing standard for the entire Apache Hadoop ecosystem, starting with Hive.
  • The above was announced by Cloudera’s Mike Olson at Spark Summit 2014, which appeared to have doubled in size over the 2013 edition!
  • Facebook described HydraBase, its internal HBase implementation, for the first time in a broad public manner. (HydraBase was also the subject of a keynote session at HBaseCon 2014.) Later in June, Facebook also described how it uses HDFS in combination with RAID concepts.
  • Cloudera and Intel described their intention to bring comprehensive enterprise-class security to Hadoop under Project Rhino — with one of the most recent steps being the contribution of code to support at-rest encryption in HDFS.
  • The first Accumulo Summit convened in Maryland, and there was much rejoicing. Presentations are here.
  • Presentations and recordings from HBaseCon 2014 were released to all.
  • Yet Another General Data Processing Platform for Hadoop (YAGDPPH), Apache Flink (previously Stratosphere), entered the Apache incubator.
  • Apache Tez graduated into a Top Level Project.

That’s all for this month, folks!

Justin Kestelyn is Cloudera’s developer outreach director.