Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.
SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.
In this guest post, Deenar Toraskar, founder of risk-analytics solution provider Think Reactive and a contributor to Spark, describes why new requirements for agile, self-service, and VaR reporting help make the case for building out new analytic infrastructure on the Apache Hadoop ecosystem.
As described previously in this post, Value at Risk (VaR) is a popular risk measure used for risk management,
Spark MLLib is growing in popularity for machine-learning model development due to its elegance and usability. In this post, you’ll learn why.
Spark MLLib is a library for performing machine-learning and associated tasks on massive datasets. With MLlib, fitting a machine-learning model to a billion observations can take a couple lines of code and leverage hundreds of machines. MLlib greatly simplifies the model development process.
In this post,
Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture.
Real-time stream processing with Apache Kafka as a backbone provides many benefits. For example, this architectural pattern can handle massive, organic data growth via the dynamic addition of streaming sources such as mobile devices, web servers, system logs, and wearable device data (aka, “Internet of Things”). Kafka can also help capture data in real-time and enable the proactive analysis of that data through Spark Streaming.
Combining CDH with a business execution engine can serve as a solid foundation for complex event processing on big data.
Event processing involves tracking and analyzing streams of data from events to support better insight and decision making. With the recent explosion in data volume and diversity of data sources, this goal can be quite challenging for architects to achieve.
Complex event processing (CEP) is a type of event processing that combines data from multiple sources to identify patterns and complex relationships across various events.