Category Archives: Kafka

Building Lambda Architecture with Spark Streaming

Categories: Kafka Spark

The versatility of Apache Spark’s API for both batch/ETL and streaming workloads brings the promise of lambda architecture to the real world.

Few things help you concentrate like a last-minute change to a major project.

One time, after working with a customer for three weeks to design and implement a proof-of-concept data ingest pipeline, the customer’s chief architect told us:

You know, I really like the design –

Read More

Jay Kreps, Apache Kafka Architect, Visits Cloudera

Categories: Cloudera Life Hadoop Kafka

It was good to see Jay Kreps (@jaykreps), the LinkedIn engineer who is the tech lead for that company’s online data infrastructure, visit Cloudera Engineering yesterday to spread the good word about Apache Kafka.

Kafka, of course, was originally developed inside LinkedIn and entered the Apache Incubator in 2011. Today, it is being widely adopted as a pub/sub framework that works at massive scale (and which is commonly used to write to Apache Hadoop clusters,

Read More