Category Archives: Kafka

Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)

Categories: HBase Kafka Use Case

Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app.

The Spendlytics iOS app is designed to help Santander’s personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly,

Read More

Building, Benchmarking, and Tuning Syslog Ingest Architecture at Vodafone UK

Categories: Flume Hadoop Kafka Platform Security & Cybersecurity Use Case

Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.

SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.

Read More

What’s New in Cloudera’s Distribution of Apache Kafka?

Categories: Kafka Platform Security & Cybersecurity

Cloudera’s distribution (now on release 2.0) of Kafka is based on Apache Kafka 0.9 and includes various new features (especially for security and usability), enhancements, and bug fixes.

Kafka is rapidly gaining momentum in enterprise Apache Hadoop deployments and has become the de facto messaging bus in most Big Data technology stacks. During this period of rapid adoption (and since Cloudera began shipping Kafka in February 2015),

Read More

How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and Cloudera Search

Categories: Cloudera Manager Guest How-to Hue Kafka Search

Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data pipelines with a minimum of code—and Cloudera Search and HUE to build a real-time search environment.

As pressure mounts on data engineers to deliver more data from more sources in less time, StreamSets Data Collector can serve as a linchpin in the data management process,

Read More

How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache Kafka

Categories: Kafka Spark Use Case

Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the performance of its real-time architecture.

Real-time stream processing with Apache Kafka as a backbone provides many benefits. For example, this architectural pattern can handle massive, organic data growth via the dynamic addition of streaming sources such as mobile devices, web servers, system logs, and wearable device data (aka, “Internet of Things”). Kafka can also help capture data in real-time and enable the proactive analysis of that data through Spark Streaming.

Read More