Category Archives: Kafka

Exactly-once Spark Streaming from Apache Kafka

Categories: Guest Kafka Spark

Thanks to Cody Koeninger, Senior Software Engineer at Kixer, for the guest post below about Apache Kafka integration points in Apache Spark 1.3. Spark 1.3 will ship in CDH 5.4.

The new release of Apache Spark, 1.3, includes new experimental RDD and DStream implementations for reading data from Apache Kafka. As the primary author of those features, I’d like to explain their implementation and usage. You may be interested if you would benefit from:

  • More uniform usage of Spark cluster resources when consuming from Kafka
  • Control of message delivery semantics
  • Delivery guarantees without reliance on a write-ahead log in HDFS
  • Access to message metadata

I’ll assume you’re familiar with the Spark Streaming docs and Kafka docs.

Read more

How-to: Do Real-Time Log Analytics with Apache Kafka, Cloudera Search, and Hue

Categories: Data Ingestion How-to Hue Kafka Search

Cloudera recently announced formal support for Apache Kafka. This simple use case illustrates how to make web log analysis, powered in part by Kafka, one of your first steps in a pervasive analytics journey.

If you are not looking at your company’s operational logs, then you are at a competitive disadvantage in your industry. Web server logs, application logs, and system logs are all valuable sources of operational intelligence,

Read more

How-to: Deploy and Configure Apache Kafka in Cloudera Enterprise

Categories: How-to Kafka

With Kafka now formally integrated with, and supported as part of, Cloudera Enterprise, what’s the best way to deploy and configure it?

Earlier today, Cloudera announced that, following an incubation period in Cloudera Labs, Apache Kafka is now fully integrated into Cloudera’s Big Data platform, Cloudera Enterprise (CDH + Cloudera Manager). Our customers have expressed strong interest in Kafka, and some are already running Kafka in production.

Read more

How Cerner Uses CDH with Apache Kafka

Categories: CDH Guest Kafka Use Case

Our thanks to Micah Whitacre, a senior software architect on Cerner Corp.’s Big Data Platforms team, for the post below about Cerner’s use case for CDH + Apache Kafka. (Kafka integration with CDH is currently incubating in Cloudera Labs.)

Over the years, Cerner Corp., a leading Healthcare IT provider, has utilized several of the core technologies available in CDH, Cloudera’s software platform containing Apache Hadoop and related projects—including HDFS,

Read more

Flafka: Apache Flume Meets Apache Kafka for Event Processing

Categories: Flume Kafka

The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure.

In this previous post you learned some Apache Kafka basics and explored a scenario for using Kafka in an online application. This post takes you a step further and highlights the integration of Kafka with Apache Hadoop, demonstrating both a basic ingestion capability as well as how different open-source components can be easily combined to create a near-real time stream processing workflow using Kafka,

Read more