Tag Archives: java

Making Apache Spark Testing Easy with Spark Testing Base

Categories: Guest Spark

Thanks to Holden Karau (@holdenkarau), Software Engineer at Alpine Data Labs (also a Spark contributor and book author), for providing the following post about her work on new base classes for testing Apache Spark programs.

Testing in the world of Apache Spark has often involved a lot of hand-rolled artisanal code, which frankly is a good way to ensure that developers write as few tests as possible. I’ve been doing some work with Spark Testing Base (also available on Spark Packages) to try and make testing Spark jobs as easy as “normal”

Read More

Using Apache Spark for Massively Parallel NLP at TripAdvisor

Categories: Guest Spark Use Case

Thanks to Jeff Palmucci, Director of Machine Learning at TripAdvisor, for permission to republish the following (originally appeared in TripAdvisor’s Engineering/Operations blog).

Here at TripAdvisor we have a lot of reviews, several hundred million according to the last announcement. I work with machine learning, and one thing we love in machine learning is putting lots of data to use.

I’ve been working on an interesting problem lately and I’d like to tell you about it.

Read More

How-to: Write a Cloud Provider Plugin for Cloudera Director

Categories: Cloud How-to

Cloudera Director 1.5 introduces a new plugin architecture to enable support for additional cloud providers. If you want to implement a plugin to add integration with a cloud provider that is not supported out-of-the-box, or to extend one of the existing plugins, these details will get you started.

As discussed in our previous blog post, the Cloudera Director Service Provider Interface (Cloudera Director SPI) defines a Java interface and packaging standards for Cloudera Director plugins.

Read More

What’s New in Cloudera Director 1.5?

Categories: Cloud

Cloudera Director 1.5 is now available; this post describes what’s inside, including a new open source plugin interface.

Cloudera Director is the manifestation of Cloudera’s commitment to providing a simple and reliable way to deploy, scale, and manage Apache Hadoop in the cloud of your choice. With Cloudera Director 1.5, we continue the story of enabling production-ready clusters and big data applications by focusing on the following themes.

Read More

How Apache Spark, Scala, and Functional Programming Made Hard Problems Easy at Barclays

Categories: Guest Spark Use Case

Thanks to Barclays employees Sam Savage, VP Data Science, and Harry Powell, Head of Advanced Analytics, for the guest post below about the Barclays use case for Apache Spark and its Scala API.

At Barclays, our team recently built an application called Insights Engine to execute an arbitrary number N of near-arbitrary SQL-like queries and execute them in a way that can scale with increasing N.

Read More