Category Archives: Spark

Apache Spark: A Delight for Developers

Categories: General Spark

Sure, Spark is fast, but it also gives developers a positive experience they won’t soon forget.

Apache Spark is well known today for its performance benefits over MapReduce, as well as its versatility. However, another important benefit – the elegance of the development experience – gets less mainstream attention.

In this post, you’ll learn just a few of the features in Spark that make development purely a pleasure.

Read more

Why Apache Spark is a Crossover Hit for Data Scientists

Categories: Data Science Spark Use Case

Spark is a compelling multi-purpose platform for use cases that span investigative, as well as operational, analytics.

Data science is a broad church. I am a data scientist — or so I’ve been told — but what I do is actually quite different from what other “data scientists” do. For example, there are those practicing “investigative analytics” and those implementing “operational analytics.” (I’m in the second camp.)

Data scientists performing investigative analytics use interactive statistical environments like R to perform ad-hoc,

Read more

This Month (and Year) in the Ecosystem (December 2013)

Categories: Community Hadoop HBase Impala Spark

Welcome to our sixth edition of “This Month in the Ecosystem,” a digest of highlights from December 2013 (never intended to be comprehensive; for completeness, see the excellent Hadoop Weekly).

With the close of 2013, we also thought it appropriate to include some high points from across the year (not listed in any particular order):

Read more

A New Web UI for Spark

Categories: Hue Spark

The team behind Hue, the open source Web UI that makes Apache Hadoop easier to use, strikes again with a new Spark app.

Editor’s note: This post was recently published on the Hue blog. We republish it here for your convenience.

Hi Spark Makers!

Hue application for Apache Spark (incubating) was recently created. It lets users execute and monitor Spark jobs directly from their browser and be more productive.

Read more

Putting Spark to Use: Fast In-Memory Computing for Your Big Data Applications

Categories: CDH Guest Hadoop MapReduce Spark

Our thanks to Databricks, the company behind Apache Spark (incubating), for providing the guest post below. Cloudera and Databricks recently announced that Cloudera will distribute and support Spark in CDH. Look for more posts describing Spark internals and Spark + CDH use cases in the near future.

Apache Hadoop has revolutionized big data processing, enabling users to store and process huge amounts of data at very low costs.

Read more