Cloudera is announcing the general availability of support for Spark, bringing interactive machine learning and stream processing to enterprise data hubs.
Cloudera is pleased to announce the immediate availability of its first release of Apache Spark for Cloudera Enterprise (comprising CDH and Cloudera Manager).
Spark was created and contributed to the Apache Software Foundation by UC Berkeley, and it has quickly gained adoption for machine learning, interactive analytics, and streaming analytics over large datasets. It features a general programming model for writing applications by composing arbitrary operators, such as mappers, reducers, joins, group-bys, and filters. Spark keeps track of the data that each of the operators produces, enabling applications to reliably store this data in memory, which makes it ideal for low-latency computations and efficient iterative algorithms. Spark applications can be up to 100x faster and require writing 2x to 10x less code than equivalent MapReduce applications.
Cloudera provides enterprise support for Spark through Cloudera Enterprise Flex Edition (as an optional component) and Data Hub Edition (as an included component) subscriptions. This release provides Spark 0.9.0 tested for use with Spark Standalone Mode on CDH 4, from 4.4.0 forward. Expect releases for Cloudera Enterprise 5 (comprising CDH 5 and Cloudera Manager 5) and Spark on YARN in the near future.
To get started now, you can follow these instructions to install Spark using parcels with Cloudera Manager. The instructions will also walk you through the basic configuration, and a simple WordCount example on Spark.
Once you get going, we would love to hear your feedback:
- You can ask questions, get help, and share your growing expertise on our community forum for questions about Spark.
- You can file a bug through our public Jira instances.