Author Archives: Anand Iyer and Mark Grover

Apache Spark 2.0 Beta Now Available for CDH

Categories: Hadoop Spark

Today, Cloudera announced the availability of an Apache Spark 2.0 Beta release for users of the Cloudera platform.

Apache Spark 2.0 is tremendously exciting (read this post for more background) because (among other things):

  • The Dataset API further enhances Spark’s claim as the best tool for data engineering by providing compile-time type safety along with the benefits of a query-optimization engine.
  • The Structured Streaming API enables the modeling of streaming data as a continuous DataFrame and expresses operations on that data with a SQL-like API.

Read More