This Month in the Ecosystem (May 2014)
More good news!
- Hadoop Summit San Jose 2014 wrapped up. Every attendee will have a different lens on the experience, but for me, the main takeaway was the increasingly mainstream presence of the enterprise juggernaut called Apache Hadoop. Clearly, Hadoop has earned a permanent place in the data center alongside the incumbents. (For further thoughts from Hadoop’s pioneers, watch this video of a pre-event Hive Think Tank panel on the topic, “Beyond MapReduce.”)
- As further support for the observation above, Cloudera’s new acquisition of Gazzang (as a complement to the platform work being led by Intel and Cloudera under Project Rhino) puts an exclamation point on the acute need for comprehensive security around production Hadoop deployments: Perimeter and access control, auditing and lineage, and data protection/encryption/key management. Clearly, organizations in regulated industries are taking Hadoop very, very seriously — and they have the most stringent requirements of all.
- New benchmark testing from Cloudera revealed the current state of SQL-on-Hadoop technology across the ecosystem: Impala, Apache Hive, Presto, and Shark. (Preview: Impala performance leads by 950% or more under multiuser workload, with much higher CPU efficiency.)
- Apache Spark 1.0 was released, signifying an important milestone for that rapidly growing effort (currently the most active project in the ecosystem, based on number and diversity of contributors).
- Parquet, the general-purpose columnar storage format for Hadoop, became an Apache Incubator project. Over the past couple of years, only Spark rivals it for rapidity of adoption.
- Black Duck Software published the results of its annual Future of Open Source survey. Survey says? Eighty percent of respondents choose OSS for its quality, and half of all corporations are expected to contribute to and adopt some form of OSS in 2015.
That’s all for this month, folks!
Justin Kestelyn is Cloudera’s developer outreach director.