Category Archives: Flume

How-to: Analyze Twitter Data with Apache Hadoop

Categories: CDH Data Ingestion Flume General Hive How-to Oozie

Social media has gained immense popularity with marketing teams, and Twitter is an effective tool for a company to get people excited about its products. Twitter makes it easy to engage users and communicate directly with them, and in turn, users can provide word-of-mouth marketing for companies by discussing the products. Given limited resources, and knowing we may not be able to talk to everyone we want to target directly, marketing departments can be more efficient by being selective about whom we reach out to.

Read more

Notes from the Flume NG Hackathon

Categories: Flume

This blog was originally posted on the Apache Blog: https://blogs.apache.org/flume/entry/apache_flume_hackathon. Apache Flume is currently undergoing incubation at The Apache Software Foundation.  More information on this project can be found at http://incubator.apache.org/flume.

The next generation of Apache’s log ingestion framework, Apache Flume NG, has bolted out of the starting blocks.  Last Friday, employees from a diversity of enterprises packed Cloudera Headquarters to learn more and to contribute to the project themselves.  

Read more

Apache Flume – Architecture of Flume NG

Categories: Avro Community Data Ingestion Flume General Hadoop

This blog was originally posted on the Apache Blog: https://blogs.apache.org/flume/entry/flume_ng_architecture

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Flume is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at http://incubator.apache.org/flume. Flume NG is work related to new major revision of Flume and is the subject of this post.

Read more

Using Flume to Collect Apache 2 Web Server Logs

Categories: Data Ingestion Flume General

Flume is a flexible, scalable, and reliable system for collecting streaming data.   The Flume User Guide describes how to configure Flume, and the new Flume Cookbook contains instructions (called recipes) for common Flume use cases.  In this post, we present a recipe that describes the common use case of using a Flume node collect Apache 2 web servers logs in order to deliver them to HDFS.

Using Flume Agents for Apache 2.x Web Server Logging

To connect Flume to Apache 2.x servers,

Read more