In this guest post, Deenar Toraskar, founder of risk-analytics solution provider Think Reactive and a contributor to Spark, describes why new requirements for agile, self-service, and VaR reporting help make the case for building out new analytic infrastructure on the Apache Hadoop ecosystem.
Spark MLLib is growing in popularity for machine-learning model development due to its elegance and usability. In this post, you’ll learn why.
Spark MLLib is a library for performing machine-learning and associated tasks on massive datasets. With MLlib, fitting a machine-learning model to a billion observations can take a couple lines of code and leverage hundreds of machines. MLlib greatly simplifies the model development process.
In this post,
Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture.
Real-time stream processing with Apache Kafka as a backbone provides many benefits. For example, this architectural pattern can handle massive, organic data growth via the dynamic addition of streaming sources such as mobile devices, web servers, system logs, and wearable device data (aka, “Internet of Things”). Kafka can also help capture data in real-time and enable the proactive analysis of that data through Spark Streaming.
Combining CDH with a business execution engine can serve as a solid foundation for complex event processing on big data.
Event processing involves tracking and analyzing streams of data from events to support better insight and decision making. With the recent explosion in data volume and diversity of data sources, this goal can be quite challenging for architects to achieve.
Complex event processing (CEP) is a type of event processing that combines data from multiple sources to identify patterns and complex relationships across various events.
Thanks to Jeff Palmucci, Director of Machine Learning at TripAdvisor, for permission to republish the following (originally appeared in TripAdvisor’s Engineering/Operations blog).
Here at TripAdvisor we have a lot of reviews, several hundred million according to the last announcement. I work with machine learning, and one thing we love in machine learning is putting lots of data to use.
I’ve been working on an interesting problem lately and I’d like to tell you about it.