Pretty busy for early Summer:
- A design document has gone upstream describing new work to make Apache Hive run on Apache Spark for a data processing backend. This effort is the first step in a collaborative broader effort by Cloudera, Databricks, IBM, Intel, and MapR to make Spark the data processing standard for the entire Apache Hadoop ecosystem, starting with Hive.
- The above was announced by Cloudera’s Mike Olson at Spark Summit 2014, which appeared to have doubled in size over the 2013 edition!
- Facebook described HydraBase, its internal HBase implementation, for the first time in a broad public manner. (HydraBase was also the subject of a keynote session at HBaseCon 2014.) Later in June, Facebook also described how it uses HDFS in combination with RAID concepts.
- Cloudera and Intel described their intention to bring comprehensive enterprise-class security to Hadoop under Project Rhino — with one of the most recent steps being the contribution of code to support at-rest encryption in HDFS.
- The first Accumulo Summit convened in Maryland, and there was much rejoicing. Presentations are here.
- Presentations and recordings from HBaseCon 2014 were released to all.
- Yet Another General Data Processing Platform for Hadoop (YAGDPPH), Apache Flink (previously Stratosphere), entered the Apache incubator.
- Apache Tez graduated into a Top Level Project.
That’s all for this month, folks!
Justin Kestelyn is Cloudera’s developer outreach director.