Category Archives: Kafka

New in Cloudera Enterprise 5.8: Flafka Improvements for Real-Time Data Ingest

Categories: Data Ingestion Flume Kafka

Learn about the new Apache Flume and Apache Kafka integration (aka, “Flafka”) available in CDH 5.8 and its support for the new enterprise features in Kafka 0.9.

Over a year ago, we wrote about the integration of Flume and Kafka (Flafka) for data ingest into Apache Hadoop. Since then, Flafka has proven to be quite popular among CDH users, and we believe that popularity is based on the fact that in Kafka deployments,

Read More

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Categories: Data Ingestion Flume Hadoop Kafka Search Spark Use Case

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends itself to batch-ingestion methods. Although such methods are suitable for many use cases,

Read More

New in Cloudera Labs: Envelope (for Apache Spark Streaming)

Categories: Cloudera Labs Data Ingestion Kafka Kudu

As a warm-up to Spark Summit West in San Francisco (June 6-8),  we’ve added a new project to Cloudera Labs that makes building Spark Streaming pipelines considerably easier.

Spark Streaming is the go-to engine for stream processing in the Cloudera stack. It allows developers to build stream data pipelines that harness the rich Spark API for parallel processing, expressive transformations, fault tolerance, and exactly-once processing. But it requires a programmer to write code,

Read More

Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)

Categories: HBase Kafka Use Case

Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander’s engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app.

The Spendlytics iOS app is designed to help Santander’s personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly,

Read More

Building, Benchmarking, and Tuning Syslog Ingest Architecture at Vodafone UK

Categories: Flume Hadoop Kafka Platform Security & Cybersecurity Use Case

Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there.

SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as accurate as the fidelity of the data involved, which is why Apache Hadoop is becoming such a valuable platform for that use case.

Read More