Category Archives: Flume

Hadoop World 2011: A Glimpse into Development

Categories: Avro Careers CDH Community Flume General Hadoop HBase HDFS Hive MapReduce Oozie Pig Sqoop Training Use Case ZooKeeper

The Development track at Hadoop World is a technical deep dive dedicated to discussion about Apache Hadoop and application development for Apache Hadoop. You will hear committers, contributors and expert users from various Hadoop projects discuss the finer points of building applications with Hadoop and the related ecosystem. The sessions will touch on foundational topics such as HDFS, HBase, Pig, Hive, Flume and other related technologies. In addition, speakers will address key development areas including tools,

Read More

Flume Community Office Hours @ Cloudera HQ, 2/28/2011

Categories: CDH Community Flume

On Monday, we held our second Flume Office Hours at Cloudera HQ in Palo Alto.  The intent was to meet informally, to talk about what’s new, to answer questions, and to get feedback from the community to help prioritize features for future releases.

Below is the slide deck from Flume Office Hours:

This time we had an online presense for folks to participate from remote locations.

Read More

Using Flume to Collect Apache 2 Web Server Logs

Categories: Data Ingestion Flume General

Flume is a flexible, scalable, and reliable system for collecting streaming data.   The Flume User Guide describes how to configure Flume, and the new Flume Cookbook contains instructions (called recipes) for common Flume use cases.  In this post, we present a recipe that describes the common use case of using a Flume node collect Apache 2 web servers logs in order to deliver them to HDFS.

Using Flume Agents for Apache 2.x Web Server Logging

To connect Flume to Apache 2.x servers,

Read More

Flume community update: September 2010

Categories: Community Data Ingestion Flume General

The past month has been exciting and productive for the community using and developing Cloudera’s Flume!  This young system is a core part of Cloudera’s Distribution for Hadoop (CDH) that is responsible for streaming data ingest.  There has been a great influx of interest and many contributions, and in this post we will provide a quick summary of this month’s new developments. First, we’re happy to announce the availability of Flume v0.9.1 and we will describe some of its updates.

Read More