Category Archives: Spark

Apache Spark Resource Management and YARN App Models

Categories: Spark YARN

A concise look at the differences between how Spark and MapReduce manage cluster resources under YARN

The most popular Apache YARN application after MapReduce itself is Apache Spark. At Cloudera, we have worked hard to stabilize Spark-on-YARN (SPARK-1101), and CDH 5.0.0 added support for Spark on YARN clusters.

In this post, you’ll learn about the differences between the Spark and MapReduce architectures, why you should care,

Read More

Making Apache Spark Easier to Use in Java with Java 8

Categories: Guest Spark

Our thanks to Prashant Sharma and Matei Zaharia of Databricks for their permission to re-publish the post below about future Java 8 support in Apache Spark. Spark is now generally available inside CDH 5.

One of Apache Spark‘s main goals is to make big data applications easier to write. Spark has always had concise APIs in Scala and Python, but its Java API was verbose due to the lack of function expressions.

Read More

How-to: Run a Simple Apache Spark App in CDH 5

Categories: CDH How-to Spark

Getting started with Apache Spark in CDH 5.x is easy using this simple example.

Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to performance, functionality, and APIs, Spark is already becoming more popular than MapReduce for certain types of workloads. (For more background about Spark, read this post.)

In this how-to,

Read More

Letting It Flow with Spark Streaming

Categories: Guest Spark

Our thanks to Russell Cardullo and Michael Ruggiero, Data Infrastructure Engineers at Sharethrough, for the guest post below about its use case for Spark Streaming.

At Sharethrough, which offers an advertising exchange for delivering in-feed ads, we’ve been running on CDH for the past three years (after migrating from Amazon EMR), primarily for ETL. With the launch of our exchange platform in early 2013 and our desire to optimize content distribution in real time,

Read More

Apache Spark: A Delight for Developers

Categories: General Spark

Sure, Spark is fast, but it also gives developers a positive experience they won’t soon forget.

Apache Spark is well known today for its performance benefits over MapReduce, as well as its versatility. However, another important benefit – the elegance of the development experience – gets less mainstream attention.

In this post, you’ll learn just a few of the features in Spark that make development purely a pleasure.

Read More