Category Archives: Use Case

Using Apache Spark for Massively Parallel NLP at TripAdvisor

Categories: Guest Spark Use Case

Thanks to Jeff Palmucci, Director of Machine Learning at TripAdvisor, for permission to republish the following (originally appeared in TripAdvisor’s Engineering/Operations blog).

Here at TripAdvisor we have a lot of reviews, several hundred million according to the last announcement. I work with machine learning, and one thing we love in machine learning is putting lots of data to use.

I’ve been working on an interesting problem lately and I’d like to tell you about it.

Read More

How Apache Spark, Scala, and Functional Programming Made Hard Problems Easy at Barclays

Categories: Guest Spark Use Case

Thanks to Barclays employees Sam Savage, VP Data Science, and Harry Powell, Head of Advanced Analytics, for the guest post below about the Barclays use case for Apache Spark and its Scala API.

At Barclays, our team recently built an application called Insights Engine to execute an arbitrary number N of near-arbitrary SQL-like queries and execute them in a way that can scale with increasing N.

Read More

Designing Fraud-Detection Architecture That Works Like Your Brain Does

Categories: Flume HBase Kafka Spark Use Case

To design effective fraud-detection architecture, look no further than the human brain (with some help from Spark Streaming and Apache Kafka).

At its core, fraud detection is about detection whether people are behaving “as they should,” otherwise known as catching anomalies in a stream of events. This goal is reflected in diverse applications such as detecting credit-card fraud, flagging patients who are doctor shopping to obtain a supply of prescription drugs,

Read More

Text Mining with Impala

Categories: Guest Impala Use Case

Thanks to Torsten Kilias and Alexander Löser of the Beuth University of Applied Sciences in Berlin for the following guest post about their INDREX project and its integration with Impala for integrated management of textual and relational data.

Textual data is a core source of information in the enterprise. Example demands arise from sales departments (monitor and identify leads), human resources (identify professionals with capabilities in ‘xyz’), market research (campaign monitoring from the social web),

Read More

How Edmunds.com Used Spark Streaming to Build a Near Real-Time Dashboard

Categories: Cloudera Labs Flume Guest Spark Use Case

Thanks to Sam Shuster, Software Engineer at Edmunds.com, for the guest post below about his company’s use case for Spark Streaming, SparkOnHBase, and Morphlines.

Every year, the Super Bowl brings parties, food and hopefully a great game to appease everyone’s football appetites until the fall. With any event that brings in around 114 million viewers with larger numbers each year, Americans have also grown accustomed to commercials with production budgets on par with television shows and with entertainment value that tries to rival even the game itself.

Read More