In this post, engineers from Wargaming.net, the online game developer and publisher, describe the design of their real-time recommendation engine built on CDH.
The scope of activities at Wargaming.net extends far beyond the development of games. We work on dozens of internal projects simultaneously, and our Data-driven Real-time Rules Engine (DDRRE) is among the most ambitious.
DDRRE is a system that analyzes large amounts of data in real time,
Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.
SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.
In this guest post, Deenar Toraskar, founder of risk-analytics solution provider Think Reactive and a contributor to Spark, describes why new requirements for agile, self-service, and VaR reporting help make the case for building out new analytic infrastructure on the Apache Hadoop ecosystem.
As described previously in this post, Value at Risk (VaR) is a popular risk measure used for risk management,
Spark MLLib is growing in popularity for machine-learning model development due to its elegance and usability. In this post, you’ll learn why.
Spark MLLib is a library for performing machine-learning and associated tasks on massive datasets. With MLlib, fitting a machine-learning model to a billion observations can take a couple lines of code and leverage hundreds of machines. MLlib greatly simplifies the model development process.
In this post,
Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture.
Real-time stream processing with Apache Kafka as a backbone provides many benefits. For example, this architectural pattern can handle massive, organic data growth via the dynamic addition of streaming sources such as mobile devices, web servers, system logs, and wearable device data (aka, “Internet of Things”). Kafka can also help capture data in real-time and enable the proactive analysis of that data through Spark Streaming.