Apache Spark 1.0 is Released

Categories: Spark

Spark 1.0 is its biggest release yet, with a list of new features for enterprise customers.

Congratulations to the Apache Spark community for today’s release of Spark 1.0, which includes contributions from more than 100 people (including Cloudera’s own Diana Carroll, Mark Grover, Ted Malaska, Sean Owen, Sandy Ryza, and Marcelo Vanzin). We think this release is an important milestone in the continuing rapid uptake of Spark by enterprises — which is supported by Cloudera via Cloudera Enterprise 5 — as a modern, general-purpose processing engine for Apache Hadoop.

Spark 1.0 contains, among other things:

  • History Server, for improved monitoring capabilities
  • Improvements to MLLib (Sparse Vector Support)
  • Improvements to Apache Avro integration
  • Support for Java 8 and lambda expressions
  • Simplified job submission to YARN cluster
  • Spark Streaming integration with Kerberos
  • Authentication of all Spark communications
  • Introduction of Spark SQL (alpha)
  • Unified application configuration and submission through spark-submit
  • PySpark on YARN support

(You’ll find more details about these features in the Release Notes. You can also read more from Databricks, here.)

Spark 1.0 will be packaged inside Cloudera’s CDH 5.1 release/available as a Cloudera Manager 5.1 parcel, which are forthcoming soon.

Fire it up!

Justin Kestelyn is Cloudera’s developer outreach director.

Spark Summit 2014 is coming (June 30 – July 2)! Register here to get 20% off the regular conference price.


14 responses on “Apache Spark 1.0 is Released

  1. What is the date for CDH 5.1 & Spark 1.0?

    When Spark 1.0 is released with CD 5.1 will it include Spark SQL?

    What is the expected date for CDH 5.1?

    1. Justin Kestelyn (@kestelyn) Post author

      As explained above, CDH 5.1 will contain Spark 1.0, including SparkSQL. However, the latter is currently considered “alpha” and thus will not be supported in that release.

      CDH 5.1 will be available very soon (mid-Summer).

  2. Sourabh Chaki

    Can we integrate spark 1.0 with cdh 5.0? If yes, what are the steps for that? I believe , we need to explode the cdh 5.0 parcel and add replace spark 0.9 with 1.0. Will this approach work? Please confirm.

    1. Justin Kestelyn (@kestelyn) Post author

      Spark 1.0 will ship inside CDH 5.1 (available imminently).

  3. Manoj

    We are currently at CDH 4.6 and would like to test Spark SQL and Spark Streaming. What are our possible ways?

  4. Calin-Andrei Burloiu

    We upgraded to CDH 5.1.0 and we now have Spark 1.0.0. Unfortunately, I am can’t see Spark History Server when I am trying to add it in Spark service, Instances tab. Additionally, I can no longer see running jobs in Master Web UI. Do you happen to know what’s the problem?

    1. Justin Kestelyn (@kestelyn) Post author

      For more rapid response, I recommend you post this issue to the “Spark” area at cloudera.com/community.

    1. Justin Kestelyn (@kestelyn) Post author

      Spark SQL is in 5.1.2 (along with other Spark modules). It’s an alpha however and thus not supported.

    1. Justin Kestelyn (@kestelyn) Post author

      Spark has been in the CDH parcel since 4.7. So, although you won’t see anything other than “Spark 0.9” as an available parcel in CM, Spark 1.2 can be found in the “CDH” one.