An ingest pattern that we commonly see being adopted at Cloudera customers is Apache Spark Streaming applications which read data from Kafka. Streaming data continuously from Kafka has many benefits such as having the capability to gather insights faster. However, users must take into consideration management of Kafka offsets in order to recover their streaming application from failures. In this post, we will provide an overview of Offset Management and following topics.
- Storing offsets in external data stores
- Not managing offsets
Overview of Offset Management
Spark Streaming integration with Kafka allows users to read messages from a single Kafka topic or multiple Kafka topics.