Apache Flume Archives - Cloudera Blog

May 10, 2018 | Technical

Scalability of Kafka Messaging using Consumer Groups

Traditional messaging models fall into two categories: Shared Message Queues and Publish-Subscribe models. Apache Kafka bridges the gaps that traditional messaging models failed to achieve.

by Cloudera 6 min read

Apache Flume Apache Kafka

August 23, 2016 | Business

New in Cloudera Enterprise 5.8: Flafka Improvements for Real-Time Data Ingest

Learn about the new Apache Flume and Apache Kafka integration (aka, “Flafka”) available in CDH 5.8 and its support for the new enterprise features in Kafka 0.9. Over a year ago, we wrote about the integration of Flume and Kafka (Flafka) for data ingest into Apache Hadoop. Since then, Flafka has proven to be quite […]

by Cloudera , Tristan Stevens 6 min read

July 29, 2016 | Technical

How-to: Ingest Email into Apache Hadoop in Real Time for Analysis

Apache Hadoop is a proven platform for long-term storage and archiving of structured and unstructured data. Related ecosystem tools, such as Apache Flume and Apache Sqoop, allow users to easily ingest structured and semi-structured data without requiring the creation of custom code. Unstructured data, however, is a more challenging subset of data that typically lends […]

by Cloudera , Stefan Salandy 10 min read

Apache Flume Apache Hadoop Apache Kafka Apache Spark Data Ingestion Search

June 23, 2016 | Technical

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time

This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds. Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for […]

by Cloudera 14 min read

Apache Flume Apache Impala Apache Spark Cloudera Enterprise

March 9, 2016 | Technical

Building, Benchmarking, and Tuning Syslog Ingest Architecture at Vodafone UK

Vodafone UK’s new SIEM system relies on Apache Flume and Apache Kafka to ingest nearly 1 million events per second. In this post, learn about the architecture and performance-tuning techniques and that got it there. SIEM platforms provide a useful tool for identifying indicators of compromise across disparate infrastructure. The catch is, they’re only as […]

by Cloudera 11 min read

Apache Flume Apache Hadoop Apache Kafka Security, Risk, & Compliance

August 3, 2015 | Technical

Inside Santander’s Near Real-Time Data Ingest Architecture

Learn about the near real-time data ingest architecture for transforming and enriching data streams using Apache Flume, Apache Kafka, and RocksDB at Santander UK. Cloudera Professional Services has been working with Santander UK to build a near real-time (NRT) transactional analytics system on Apache Hadoop. The objective is to capture, transform, enrich, count, and store […]

by Cloudera , Ian Buss , Rob Siwicki 7 min read

Apache Flume Apache HBase Apache Kafka

June 8, 2015 | Technical

Inside Apache HBase’s New Support for MOBs

Learn about the design decisions behind HBase’s new support for MOBs. Apache HBase is a distributed, scalable, performant, consistent key value database that can store a variety of binary data types. It excels at storing many relatively small values (<10K), and providing low-latency reads and writes. However, there is a growing demand for storing documents, […]

by Cloudera , Jingcheng Du 7 min read

Apache Flume Apache HBase Apache Sqoop Cloudera Enterprise

June 1, 2015 | Technical

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment. The Apache Hadoop ecosystem has become a preferred platform for enterprises seeking to process and understand large-scale data in real time. Technologies like Apache Kafka, Apache Flume, Apache Spark, Apache Storm, and Apache Samza […]

by Cloudera 6 min read

Apache Flume Apache Hadoop Apache HBase Apache Kafka Apache Spark Data Ingestion

November 6, 2014 | Technical

Flafka: Apache Flume Meets Apache Kafka for Event Processing

The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure. In this previous post you learned some Apache Kafka basics and explored a scenario for using Kafka in an online application. This post takes you a step further and highlights the integration of Kafka with Apache Hadoop, demonstrating […]

by Cloudera , Jeff Holoman 10 min read

Apache Flume Apache Kafka

March 14, 2012 | Technical

Apache HBase + Apache Hadoop + Xceivers

Introduction Some of the configuration properties found in Apache Hadoop have a direct effect on clients, such as Apache HBase. One of those properties is called “dfs.datanode.max.xcievers”, and belongs to the HDFS subproject. It defines the number of server side threads and – to some extent – sockets used for data connections. Setting this number too […]

by Cloudera 14 min read

Apache Flume Apache Hadoop Apache HBase Oozie

Filter By