Category Archives: Use Case

Inside Wargaming.net’s Data-driven, Real-time Rules Engine

Categories: CDH Guest Use Case

In this post, engineers from Wargaming.net, the online game developer and publisher, describe the design of their real-time recommendation engine built on CDH.

The scope of activities at Wargaming.net extends far beyond the development of games. We work on dozens of internal projects simultaneously, and our Data-driven Real-time Rules Engine (DDRRE) is among the most ambitious.

DDRRE is a system that analyzes large amounts of data in real time,

Read More

Building, Benchmarking, and Tuning Syslog Ingest Architecture at Vodafone UK

Categories: Flume Hadoop Kafka Platform Security & Cybersecurity Use Case

Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.

SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.

Read More

Fast and Flexible Risk Aggregation on Apache Spark

Categories: Guest Spark Use Case

In this guest post, Deenar Toraskar, founder of risk-analytics solution provider Think Reactive and a contributor to Spark, describes why new requirements for agile, self-service, and VaR reporting help make the case for building out new analytic infrastructure on the Apache Hadoop ecosystem.

As described previously in this post, Value at Risk (VaR) is a popular risk measure used for risk management,

Read More

How-to: Predict Telco Churn with Apache Spark MLlib

Categories: Data Science Spark Use Case

Spark MLLib is growing in popularity for machine-learning model development due to its elegance and usability. In this post, you’ll learn why.

Spark MLLib is a library for performing machine-learning and associated tasks on massive datasets. With MLlib, fitting a machine-learning model to a billion observations can take a couple lines of code and leverage hundreds of machines. MLlib greatly simplifies the model development process.

In this post,

Read More

How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache Kafka

Categories: Kafka Spark Use Case

Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture.

Real-time stream processing with Apache Kafka as a backbone provides many benefits. For example, this architectural pattern can handle massive, organic data growth via the dynamic addition of streaming sources such as mobile devices, web servers, system logs, and wearable device data (aka, “Internet of Things”). Kafka can also help capture data in real-time and enable the proactive analysis of that data through Spark Streaming.

Read More