Open source messaging system.

Goal

Overview

Central concept - topic. It's ordered collection of messages (messages are guaranteed to be ordered per partition).

Multiple producers write to one topic. Multiple consumers read messages from it.

To achieve scalability and performance each topic is maintained as a partitioned log, which is stored across multiple nodes called brokers. Each partition is ordered, immutable, sequence of messages and each message has sequence id - offset assigned to it.

Consumers can produce messages in any order by providing offset. Thus consumers can replay the data.

The messages are stored durably for a configurable retention period.

Multiple consumer instances might create a consumer group. In the consumer group, the message is delivered if any of the consumers read it.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2ba03841-93da-497c-b845-248d042d7f3d/Untitled.png

Partitions are replicated for fault tolerance in a leader-follower fashion. Leader handles writes and reads and followers replicate it.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e44f11a6-d2d5-4be8-bd3e-c4734b4fe9da/Untitled.png

Kafka uses Zookeeper for leader election. But log replication is separated from key elements of consensus protocol, and Kafka uses own in sync replication. Replica in ISR is guaranteed to have data (since leader waits until all replicas respond), data is never lost.

Tuning

User can control replication factor, ISR set and number of replicas from ISR set that need to acknowledge record before it's commited (ack).