Category Archives: Flume

Flafka: Apache Flume Meets Apache Kafka for Event Processing

Categories: Flume Kafka

The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure.

In this previous post you learned some Apache Kafka basics and explored a scenario for using Kafka in an online application. This post takes you a step further and highlights the integration of Kafka with Apache Hadoop, demonstrating both a basic ingestion capability as well as how different open-source components can be easily combined to create a near-real time stream processing workflow using Kafka,

Read More

Apache Kafka for Beginners

Categories: Flume Kafka

When used in the right way and for the right use case, Kafka has unique attributes that make it a highly attractive option for data integration.

Apache Kafka is creating a lot of buzz these days. While LinkedIn, where Kafka was founded, is the most well known user, there are many companies successfully using this technology.

So now that the word is out, it seems the world wants to know: What does it do?

Read More

The New Apache Flume Book is in Early Release

Categories: Books Flume

Congratulations to Hari Shreedharan, Cloudera software engineer and Apache Flume committer/PMC member, for the early release of his new O’Reilly Media book, Using Flume: Stream Data into HDFS and HBase. It’s the seventh Hadoop ecosystem book so far that was authored by a current or former Cloudera employee (but who’s counting?).

Why did you decide to write this book?

Flume book

I have been working on Apache Flume for the past two years,

Read More

Email Indexing Using Cloudera Search

Categories: Flume Hadoop Kite SDK Search Use Case

Why would any company be interested in searching through its vast trove of email? A better question is: Why wouldn’t everybody be interested? 

Email has become the most widespread method of communication we have, so there is much value to be extracted by making all emails searchable and readily available for further analysis. Some common use cases that involve email analysis are fraud detection, customer sentiment and churn, lawsuit prevention, and that’s just the tip of the iceberg.

Read More

How-to: Analyze Twitter Data with Hue

Categories: Data Science Flume Hive How-to Hue

Hue 2.2 , the open source web-based interface that makes Apache Hadoop easier to use, lets you interact with Hadoop services from within your browser without having to go to a command-line interface. It features different applications like an Apache Hive editor and Apache Oozie dashboard and workflow builder.

This post is based on our “Analyzing Twitter Data with Hadoop” sample app and details how the same results can be achieved through Hue in a simpler way.

Read More