Category Archives: Kafka

New in Cloudera Labs: Envelope (for Apache Spark Streaming)

Categories: Cloudera Labs Data Ingestion Kafka Kudu

As a warm-up to Spark Summit West in San Francisco (June 6-8),  we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier.

Spark Streaming is the go-to engine for stream processing in the Cloudera stack. It allows developers to build stream data pipelines that harness the rich Spark API for parallel processing, expressive transformations, fault tolerance, and exactly-once processing. But it requires a programmer to write code,

Read more

Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)

Categories: HBase Kafka Use Case

Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app.

The Spendlytics iOS app is designed to help Santander’s personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly,

Read more

Building, Benchmarking, and Tuning Syslog Ingest Architecture at Vodafone UK

Categories: Flume Hadoop Kafka Platform Security & Cybersecurity Use Case

Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.

SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.

Read more

What’s New in Cloudera’s Distribution of Apache Kafka?

Categories: Kafka Platform Security & Cybersecurity

Cloudera’s distribution (now on release 2.0) of Kafka is based on Apache Kafka 0.9 and includes various new features (especially for security and usability), enhancements, and bug fixes.

Kafka is rapidly gaining momentum in enterprise Apache Hadoop deployments and has become the de facto messaging bus in most Big Data technology stacks. During this period of rapid adoption (and since Cloudera began shipping Kafka in February 2015),

Read more

How-to: Build a Real-Time Search System using StreamSets, Apache Kafka, and Cloudera Search

Categories: Cloudera Manager Guest How-to Hue Kafka Search

Thanks to Jonathan Natkins, a field engineer from StreamSets, for the guest post below about using StreamSets Data Collector—open source, GUI-driven ingest technology for developing and operating data pipelines with a minimum of code—and Cloudera Search and HUE to build a real-time search environment.

As pressure mounts on data engineers to deliver more data from more sources in less time, StreamSets Data Collector can serve as a linchpin in the data management process,

Read more