Thanks to Cody Koeninger, Senior Software Engineer at Kixer, for the guest post below about Apache Kafka integration points in Apache Spark 1.3. Spark 1.3 will ship in CDH 5.4.
The new release of Apache Spark, 1.3, includes new experimental RDD and DStream implementations for reading data from Apache Kafka. As the primary author of those features, I’d like to explain their implementation and usage. You may be interested if you would benefit from:
- More uniform usage of Spark cluster resources when consuming from Kafka
- Control of message delivery semantics
- Delivery guarantees without reliance on a write-ahead log in HDFS
- Access to message metadata