Category Archives: Use Case

Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)

Categories: HBase Kafka Use Case

Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app.

The Spendlytics iOS app is designed to help Santander’s personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly,

Read More

The Barclays Data Science Hackathon: Using Apache Spark and Scala for Rapid Prototyping

Categories: Guest Spark Use Case

In this guest post, members of the Barclays Advanced Data Analytics Team describe the results of an offsite hackathon to develop a recommendation system using Apache Spark.

In the depths of the cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a week to collaboratively solve a key business problem: how to design a better customer experience. We framed the problem in the context of using customer shopping behavior data to build a personalized recommender system.

Read More

Genome Analysis Toolkit: Now Using Apache Spark for Data Processing

Categories: Data Science Spark Use Case

Users of the latest release of the Genome Analysis Toolkit, an open source framework for analyzing high-throughput DNA sequencing data, can now choose Apache Spark for data processing.

Ever since the Human Genome Project produced the first draft sequence of the human genome in 2000, the cost of sequencing has dropped exponentially, from around US$100 million per genome then to around US$1,000 today. Over the same period, we have seen massive growth in the storage and processing capabilities of big data technologies like Apache Hadoop.

Read More

Inside Wargaming.net’s Data-driven, Real-time Rules Engine

Categories: CDH Guest Use Case

In this post, engineers from Wargaming.net, the online game developer and publisher, describe the design of their real-time recommendation engine built on CDH.

The scope of activities at Wargaming.net extends far beyond the development of games. We work on dozens of internal projects simultaneously, and our Data-driven Real-time Rules Engine (DDRRE) is among the most ambitious.

DDRRE is a system that analyzes large amounts of data in real time,

Read More

Building, Benchmarking, and Tuning Syslog Ingest Architecture at Vodafone UK

Categories: Flume Hadoop Kafka Security Use Case

Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.

SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.

Read More