A detailed backend reference

Kafka, the log
at the heart of your system.

A first-principles walkthrough of the distributed event-streaming platform that backs the nervous systems of LinkedIn, Uber, Netflix and most modern fintech — from the one idea everything rests on (an append-only commit log), through partitions, consumer groups, offsets and delivery semantics, into retention, schemas, event-driven patterns, the outbox/CDC trick, and stream processing. Written to explain not just what each piece does but why it exists and how it works underneath. Producer/consumer code in both Go and Python.

Distributed commit log Partitions & consumer groups Go 1.22+ · Python 3.11+ 21 sections
Part I · The Log Model
01

What Kafka Actually Is

Strip away the marketing and Kafka is one thing: a distributed, append-only, durable log of events that many independent consumers can read at their own pace. Not a queue, not a database, not a message bus in the classic sense — a log. Producers append records to the end; consumers read forward from a position they track themselves; and the records stay after being read, for a configured retention. Once that single picture is in your head, every feature in this manual is a consequence of it.

The official description is "a distributed event streaming platform," and each word earns its place. Distributed: it runs as a cluster of brokers, replicating data for fault tolerance and scaling horizontally. Event streaming: it's built around an unbounded, continuous flow of immutable events (something happened: an order was placed, a payment cleared, a sensor read), not request/response calls. Platform: producers, consumers, a connector ecosystem, and stream processing all sit on the same log.

A traditional message queue is a to-do list: you take an item, do it, and it's gone. Kafka is a ship's logbook: every event is written down in order and stays in the book. Ten different officers can each read the log from wherever they left off, re-read yesterday's entries, or start from the beginning — and reading never erases a line.
The shape of Kafka
Producers append; the log persists; many consumers read independently
producers append → THE LOG (append-only, retained) e0 e1 e2 e3 e4 offset 0    1     2     3     4 → new writes go to the tail consumer A @ offset 4 consumer B @ offset 1 consumer C @ offset 0 Each consumer tracks its own position; reading e1 doesn't remove it — another consumer can still read it.
This decoupling — durable retained events plus per-consumer positions — is the whole game. It's what lets one stream of events feed many unrelated systems independently.
The one definition

Kafka is a distributed, durable, append-only commit log of immutable events, partitioned for scale and replicated for fault tolerance, that multiple independent consumers read at their own offset — with events retained after reading. Everything else is detail on top of that.

02

Queue vs Log — Why Kafka Is Different

This is the most important section in the manual, because conflating Kafka with a traditional message queue is the root of most confusion. The task-queues chapter covered the queue model in depth — producers, workers, brokers like RabbitMQ and SQS, retries, visibility timeouts. Kafka is a different shape of thing, and the difference is not a detail; it changes what problems it solves.

The fundamental contrast
A queue distributes & deletes; a log retains & replays
QUEUE (RabbitMQ, SQS) — competing consumers msg in m3m2m1 worker 1 worker 2 each message goes to ONE worker & is DELETED once acked no replay · order across consumers lost LOG (Kafka) — independent consumers e0e1e2e3 retained — nothing deleted on read billing reads all analytics reads all search reads all every consumer sees EVERY event · can replay from any offset
DimensionTraditional queue (ch.10)Kafka log
After a message is readDeleted (consumed once)Retained (until retention expires)
Who receives a messageOne competing consumerEvery consumer group, independently
Replay / re-readNo — it's goneYes — seek to any offset
Consumer positionBroker tracks & removesConsumer tracks its own offset
OrderingWeak/none across consumersStrict within a partition
Mental modelWork distribution (to-do list)Event history / source of truth (logbook)
Sweet spotTask offloading, job processingEvent streaming, many subscribers, replay, high throughput
They coexist — this isn't either/or

You'll often use both: Kafka as the durable event backbone (orders, payments, clicks streamed to many systems), and a task queue (Celery/SQS — ch.10) for discrete background jobs (send this email, resize that image). The full comparison and "which when" is §17. For now hold the distinction: queue = distribute work and forget; log = record events and let many systems consume & replay them.

03

The Commit Log

Everything Kafka does rests on the humblest data structure in computing: the append-only log — an ordered sequence of records you can only add to the end of, never insert into the middle of or modify. Each record gets a monotonically increasing offset (0, 1, 2, …) that is its permanent address. This is the same structure a database uses for its write-ahead log and durability; Kafka's insight was to make the log itself the primary, public abstraction rather than a hidden internal detail.

Append-only, offset-addressed
Writes go to the tail; the offset is a permanent address
rec rec rec rec rec new 012345 append only at the tail a reader advances forward by offset; it never goes back unless you seek No updates, no deletes-in-place, no random inserts — just append and read. That constraint is the superpower.

Why an append-only log is so fast and so useful

  • Sequential I/O is cheap. Appending to the end of a file and reading it forward are sequential disk operations — dramatically faster than random access, even on SSDs. Kafka leans on the OS page cache and zero-copy transfer to move data from disk to network without passing through application memory, which is how it sustains millions of messages per second on modest hardware.
  • Immutability is simple & safe. Records never change, so there are no locks for updates, no read/write conflicts, and concurrent readers never interfere. An immutable history is also trivially cacheable and easy to reason about.
  • The log is the history. Because nothing is overwritten, the log is a complete, ordered record of everything that happened — which is exactly what makes replay (§11), event sourcing (§14), and feeding new consumers from the beginning possible.
One structure, everything follows

Hold the append-only log firmly: writes append and get an offset; reads move forward; records are immutable and retained. Partitions (§4) are just multiple such logs; consumer groups (§7) are bookmarks into them; delivery semantics (§9) are about how carefully you track those bookmarks. The whole system is the log idea, scaled and replicated.

Part II · Architecture
04

Topics & Partitions

A topic is a named stream of related events — orders, payments, page-views. It's the logical channel producers write to and consumers read from. But a topic is not a single log; it's split into partitions, and the partition is where all the real mechanics live. The partition is simultaneously Kafka's unit of parallelism, of ordering, and of storage — three of the most important properties in the system, all riding on this one concept.

A topic is partitioned
One topic, several independent append-only logs; a key picks the partition
producerkey → hash TOPIC: orders (3 partitions) P0 P1 P2 Ordering is guaranteed WITHIN a partition, not across them. Records with the same key always land on the same partition, so all events for one order stay in order. partition = hash(key) % numPartitions

The three roles of a partition

  • Parallelism. Partitions can live on different brokers and be consumed by different consumers at once, so a topic's throughput scales with its partition count. More partitions → more parallel consumption (up to the consumer-group limit — §7).
  • Ordering. Kafka guarantees order within a single partition and makes no promise across partitions. This is the central trade-off (§10): to keep related events ordered, you route them to the same partition via a shared key (all events for order-42 use key "order-42", so they're strictly ordered relative to each other).
  • Storage. Each partition is physically a set of segment files on a broker's disk — the actual append-only log from §3. Retention and compaction (§11) operate per partition.
Partition count is a one-way-ish decision

You can increase a topic's partitions later, but doing so breaks key-to-partition stability (the modulo changes, so existing keys may move) and thus breaks ordering guarantees for in-flight keys. And you can't easily decrease them. So choose partition count deliberately up front based on target throughput and the maximum consumer parallelism you'll need — too few caps your scaling, too many adds overhead (§19, §20).

05

Brokers, Replication & Durability

A Kafka cluster is a set of brokers (servers); partitions are spread across them, and that's how the cluster scales and survives failure. Durability comes from replication: each partition has one leader replica and some follower replicas on other brokers. All reads and writes go to the leader; followers continuously copy from it. If the leader's broker dies, a follower is promoted — no data lost, brief unavailability. This is the same desired-state/failover thinking as the Kubernetes chapter, applied to data.

Replication
Each partition has a leader and in-sync followers on other brokers
Broker 1 P0 (leader) P1 (follower) P2 (follower) Broker 2 P0 (follower) P1 (leader) P2 (follower) Broker 3 P0 (follower) P1 (follower) P2 (leader) Leaders are spread across brokers for balance; each follower replicates its leader. Lose a broker → its leaders' followers take over.

The durability dials: acks & ISR

How safe a write is depends on two settings. The producer's acks says how many replicas must confirm before the write is considered done: acks=0 (fire-and-forget, may lose data), acks=1 (leader only — lost if the leader dies before followers copy it), or acks=all (the leader plus all in-sync replicas — the safe choice). The ISR (in-sync replica set) is the followers currently caught up with the leader; pairing acks=all with min.insync.replicas=2 means a write only succeeds if at least two replicas have it — surviving a single broker loss with zero data loss.

SettingGuaranteeTrade-off
acks=0None — fire and forgetFastest; can silently lose data
acks=1Leader has itFast; lost if leader fails pre-replication
acks=all + min.insync.replicas=2≥2 replicas have it before successSafest; slightly higher latency — the standard for important data
ZooKeeper is gone — KRaft

Older Kafka relied on a separate ZooKeeper cluster to store metadata and elect leaders. Modern Kafka uses KRaft (Kafka Raft), folding that role into the brokers themselves via a Raft-based controller quorum — fewer moving parts, faster failover, simpler ops. If you read older docs mentioning ZooKeeper, KRaft is its replacement.

06

Producers

A producer publishes records to a topic. Three producer concerns shape correctness and performance: the key (which decides the partition and thus ordering — §4), the acks (durability — §5), and batching (the producer groups records and compresses them for throughput). A record is a key/value pair plus optional headers and a timestamp; the value is your event payload (often Avro/Protobuf — §12).

// Using confluent-kafka-go (librdkafka). kafka-go (segmentio) is a pure-Go alternative.
p, _ := kafka.NewProducer(&kafka.ConfigMap{
    "bootstrap.servers": "localhost:9092",
    "acks":              "all",   // durability: leader + in-sync replicas (§5)
    "enable.idempotence": true,   // no duplicates on producer retry (§9)
    "linger.ms":         5,       // wait up to 5ms to batch records (throughput)
    "compression.type":  "zstd",
})
defer p.Close()

topic := "orders"
// The KEY decides the partition → all events for one order stay ordered (§4, §10).
err := p.Produce(&kafka.Message{
    TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
    Key:            []byte("order-42"),
    Value:          []byte(`{"event":"placed","amount":1999}`),
}, nil)
if err != nil { log.Fatal(err) }

// Producing is async + batched; Flush blocks until queued messages are delivered.
p.Flush(5000)

// Delivery reports tell you the final partition/offset (or an error) per message:
//   for e := range p.Events() { if m, ok := e.(*kafka.Message); ok { ... } }
# Using confluent-kafka (librdkafka). kafka-python / aiokafka are alternatives.
from confluent_kafka import Producer

p = Producer({
    "bootstrap.servers": "localhost:9092",
    "acks": "all",                 # durability: leader + in-sync replicas (§5)
    "enable.idempotence": True,    # no duplicates on producer retry (§9)
    "linger.ms": 5,                # batch window for throughput
    "compression.type": "zstd",
})

def on_delivery(err, msg):         # async callback: final partition/offset or error
    if err:
        print("delivery failed:", err)
    else:
        print(f"delivered to {msg.topic()}[{msg.partition()}]@{msg.offset()}")

# The KEY decides the partition → all events for one order stay ordered (§4, §10).
p.produce(
    "orders",
    key=b"order-42",
    value=b'{"event":"placed","amount":1999}',
    on_delivery=on_delivery,
)
p.flush()                          # block until queued messages are delivered
No key = round-robin = no ordering

If you don't set a key, records are spread across partitions (round-robin / sticky), maximizing parallelism but giving up any ordering relationship between them. That's fine for independent events; it's a bug if those events needed to stay ordered (all updates to one account, say). Key by your ordering/entity boundary — usually the entity ID — whenever order matters (§10).

07

Consumers & Consumer Groups

A consumer reads records from partitions. The powerful abstraction is the consumer group: a set of consumers sharing a group ID that cooperatively divide a topic's partitions among themselves so each partition is read by exactly one consumer in the group. Add consumers to the group and Kafka rebalances, reassigning partitions to spread the load. This is how you scale consumption horizontally — and it's the mechanism behind both work-sharing and broadcast.

Consumer groups
Within a group, partitions are split; different groups each get everything
topic: 4 partitions P0 P1 P2 P3 group A (2 consumers) c1 ← P0,P1 c2 ← P2,P3 group B (1 consumer) c1 ← P0,P1,P2,P3 group A splits the load (scale out within a group) group B is independent — it sees ALL events too Same group = work-sharing (each event once). Different groups = broadcast (each group gets everything). max useful consumers in a group = number of partitions (extra consumers sit idle)
c, _ := kafka.NewConsumer(&kafka.ConfigMap{
    "bootstrap.servers": "localhost:9092",
    "group.id":          "billing",        // the consumer GROUP — scale by adding instances
    "auto.offset.reset": "earliest",       // where to start if no committed offset (§8)
    "enable.auto.commit": false,           // we'll commit manually for at-least-once (§9)
})
defer c.Close()
c.Subscribe("orders", nil)                 // Kafka assigns this instance some partitions

for {
    msg, err := c.ReadMessage(-1)          // blocks for the next record
    if err != nil { continue }

    process(msg.Value)                     // do the work FIRST...
    c.CommitMessage(msg)                   // ...THEN commit the offset (at-least-once, §9)
}
from confluent_kafka import Consumer

c = Consumer({
    "bootstrap.servers": "localhost:9092",
    "group.id": "billing",                 # the consumer GROUP — scale by adding instances
    "auto.offset.reset": "earliest",       # where to start if no committed offset (§8)
    "enable.auto.commit": False,           # commit manually for at-least-once (§9)
})
c.subscribe(["orders"])                    # Kafka assigns this instance some partitions

try:
    while True:
        msg = c.poll(1.0)                  # wait up to 1s for the next record
        if msg is None:
            continue
        if msg.error():
            print("error:", msg.error()); continue

        process(msg.value())               # do the work FIRST...
        c.commit(msg)                      # ...THEN commit the offset (at-least-once, §9)
finally:
    c.close()                              # leave the group cleanly → triggers a rebalance
Partitions cap your parallelism

Because one partition is consumed by at most one consumer per group, a group can usefully have at most as many active consumers as the topic has partitions. A topic with 4 partitions and 6 group members leaves 2 consumers idle. So your maximum consumption parallelism is set by partition count (§4) — another reason to size partitions for your peak throughput up front (§20).

Part III · Guarantees
08

Offsets & Offset Management

An offset is a record's position in its partition (§3). The consumer's job is to track which offset it has processed up to — its committed offset — so that on restart or rebalance it resumes from the right place instead of re-reading everything or skipping records. Kafka stores these committed offsets in an internal topic (__consumer_offsets), keyed by group + partition. The gap between the latest offset in a partition and a group's committed offset is consumer lag — the single most important health metric for a consumer (§19).

Position & lag
Committed offset, current position, and the lag to the tail
processed & committed read, not yet committed unread (lag) committed offset current position log end (tail) LAG = tail − position Growing lag means the consumer is falling behind

When you commit decides your guarantee

The choice between auto-commit (Kafka periodically commits in the background) and manual commit (you commit explicitly) is really a choice about delivery semantics (§9). The ordering of "process the message" versus "commit the offset" is everything: commit after processing and a crash in between means you'll re-process (at-least-once); commit before processing and a crash means you'll skip (at-most-once). Auto-commit on a timer can do either depending on timing, which is why correctness-sensitive consumers commit manually after the work is done (as in §7's code).

Offsets are per group, and you can seek

Because each group has its own committed offsets, a brand-new group can start from earliest and replay the entire retained history — the basis for bootstrapping a new consumer or rebuilding a view (§14). You can also explicitly seek a consumer to any offset or timestamp to reprocess a range. The log's retained immutability (§3) is what makes this possible; a queue can't do it.

09

Delivery Semantics

The question every distributed messaging system must answer: if things fail and retry, how many times might a message be delivered? There are three possible guarantees, and Kafka can give you any of them depending on configuration — so you must choose consciously based on what your processing can tolerate.

Three semantics
The trade-off between losing messages and duplicating them
At-most-once commit BEFORE processing may LOSE messages never duplicates ok for lossy metrics At-least-once commit AFTER processing never loses may DUPLICATE the common default + idempotency Exactly-once idempotent producer + txns no loss, no dupes costlier · Kafka-to-Kafka read-process-write pipelines Most systems choose at-least-once and make processing idempotent — the pragmatic sweet spot.
At-least-once + idempotent consumers = the practical answer

True end-to-end exactly-once is narrow: Kafka's enable.idempotence stops producer duplicates, and transactions give exactly-once for Kafka-to-Kafka stream processing (consume, transform, produce, and commit offsets atomically). But the moment your consumer writes to an external system (a database, a payment API), exactly-once across that boundary isn't free. The battle-tested approach: choose at-least-once and make your processing idempotent — safe to apply the same event twice (dedupe by event ID, upsert instead of insert). This is the same idempotency discipline as the task-queues chapter (§14 there) and REST idempotency keys, and it's why duplicates become harmless rather than catastrophic.

making at-least-once safe: idempotent processing (pseudocode, both languages)
-- The consumer may see the same event twice on retry/rebalance.
-- Make the write idempotent so a duplicate is a no-op:

-- (a) dedupe by a unique event id
INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING;        -- second delivery inserts nothing
-- ...only do the side effect if the insert affected a row.

-- (b) or make the effect itself idempotent — upsert the end state, don't increment
INSERT INTO account_balance (account_id, balance) VALUES ($1, $2)
ON CONFLICT (account_id) DO UPDATE SET balance = EXCLUDED.balance;
-- applying the same "balance = 100" twice lands on the same state.
10

Ordering Guarantees

One sentence to memorize: Kafka guarantees ordering within a partition, and nowhere else. Records in partition P are delivered to a consumer in the exact offset order they were written. Across partitions of the same topic, there is no global ordering — records interleave arbitrarily. This isn't a limitation to work around so much as the direct price of partition-level parallelism (§4): you can have total ordering, or you can have horizontal scale, but not both at once for the same stream.

Ordering vs parallelism
Keying preserves per-entity order; the trade-off is partition skew
key = order id → same order's events stay together & ordered producer P0 o42:placedo42:paido42:shipped P1 o88:placedo88:paid o42 strictly ordered in P0; o88 strictly ordered in P1; P0 vs P1 order = undefined Risk: a "hot" key (one huge customer) overloads its single partition — partition skew (§18). Choose the key at the granularity where order matters but load still spreads.
Two ordering foot-guns

First, key skew: if order requires keying by a coarse value (e.g. one mega-tenant), all that tenant's traffic funnels into one partition, capping its throughput regardless of how many partitions exist (§18). Second, retries can reorder: with a non-idempotent producer and multiple in-flight requests, a retried message can land after a later one. Enabling enable.idempotence (§6) preserves per-partition order even across retries — another reason it's a sensible default.

11

Retention & Log Compaction

Since records are retained after reading (§2), Kafka needs rules for when to remove them — otherwise the log grows forever. There are two cleanup policies, and they answer different needs. Time/size retention (the default) keeps records for a window — say 7 days or 50 GB per partition — then deletes the oldest. Log compaction is cleverer: it keeps the latest record for each key and garbage-collects older versions of that key, turning the log into a continuously-updated snapshot of "the current value per key."

Retention vs compaction
Delete by age, or keep the latest value per key
delete policy (time/size) — drop the oldest oldoldkeepkeepkeep records past the window are deleted regardless of key compact policy — keep latest per KEY before: k=A v=1k=B v=1k=A v=2k=A v=3 after: k=B v=1k=A v=3 only the newest value of each key survives — a live snapshot Compaction makes a topic usable as a durable, replayable key/value store of current state.
Compaction powers state & the replay story

A compacted topic is how Kafka holds current state forever without unbounded growth — the basis for event-sourcing snapshots (§14), Kafka Streams' state stores and KTables (§16), and config/lookup data that a new consumer can replay from scratch to rebuild a complete picture. Time-retention topics answer "what happened recently"; compacted topics answer "what is the latest state of everything." Pick the policy by which question your topic exists to answer.

Part IV · Schemas & Patterns
12

Schemas & the Schema Registry

Kafka itself treats record values as opaque bytes — it neither knows nor cares what's inside. That freedom is a trap at scale: if producers and consumers don't agree on the payload's shape, a producer changing a field silently breaks every downstream consumer (the same contract-drift danger the gRPC and serialization chapters warned about, now spread across many independent teams). The fix is a schema for each topic plus a Schema Registry that stores schemas, assigns them IDs, and enforces compatibility as they evolve.

Schema Registry
Producers and consumers share a registered, versioned schema
producerserialize + schema id Schema Registryversioned · checks compat topic (bytes + id)Kafka stores only bytes consumerfetch schema, decode register produce fetch by id The registry rejects an incompatible schema change — breaking changes fail at deploy, not in production.

Formats & compatibility

The common serialization formats are Avro (the traditional Kafka pairing, compact with a rich schema-evolution story), Protobuf (the same IDL as the gRPC chapter, increasingly popular), and JSON Schema (readable, larger). The registry enforces a compatibility mode — typically backward compatibility, meaning new consumers can read old data: you may add optional fields and remove ones with defaults, but you may not rename or retype a field. This is precisely the additive-change discipline from the gRPC chapter (§04 there: field numbers, never reuse) and REST versioning, enforced automatically by a server.

A schema is a cross-team contract

In a microservice fleet, a topic's schema is a public API that many independent teams depend on. The Schema Registry turns "please don't break the payload" from a tribal convention into a machine-checked gate — the same role buf plays for gRPC .proto files. Treat schema changes with the same care as any API change: additive and compatible by default.

13

Event-Driven Patterns

Kafka is the substrate for event-driven architecture (EDA), where services communicate by emitting and reacting to events rather than calling each other directly. Instead of the order service commanding the email, inventory, and analytics services (tight coupling, the order service must know them all), it simply announces "OrderPlaced" to a topic, and any interested service subscribes. The producer doesn't know — or care — who consumes. This is the deep payoff of the log's decoupling (§1): you add a new consumer without touching the producer.

Choreography
One event, many independent reactions — the producer is unaware
order serviceemits OrderPlaced topic: orders email service inventory service analytics service Add a "loyalty-points" consumer tomorrow — zero changes to the order service.
PatternWhat the event carriesTrade-off
Event notificationJust "X happened" + an IDTiny messages; consumers must call back for details (coupling, load)
Event-carried state transferThe full relevant state in the eventConsumers are self-sufficient (no callbacks); larger messages, possible duplication
ChoreographyServices react to events autonomouslyLoose coupling, easy to extend; flow is harder to see end-to-end
OrchestrationA coordinator drives the steps (a saga)Explicit, visible flow; reintroduces a central coordinator
Decoupling isn't free

EDA trades the clarity of a direct call for flexibility. The downside is that no single place describes the whole workflow — "what happens when an order is placed?" is now spread across many subscribers, which makes debugging and reasoning harder. Mitigate with good observability (trace/correlation IDs through event headers, the same idea as §13/§15 of the layers and logging chapters), clear event schemas (§12), and choosing orchestration (an explicit saga) when a flow genuinely needs a visible, coordinated sequence.

14

Event Sourcing & CQRS

Two advanced patterns the log enables, often confused, frequently paired. Event sourcing stores state as the sequence of events that produced it rather than just the current snapshot: instead of a row balance = 100, you keep Deposited 60, Deposited 50, Withdrew 10 — and derive the balance by replaying them. Kafka's immutable, ordered, retained log (§3) is a natural event store. CQRS (Command Query Responsibility Segregation) separates the write model from one or more read models optimized for querying.

Event sourcing + CQRS
The event log is the source of truth; read models are derived projections
command"deposit 60" event logSOURCE OF TRUTH (append-only) read model: search indexprojection for fast queries read model: SQL viewprojection for reporting Lost a read model? Rebuild it by replaying the log from offset 0. The events are the truth; views are disposable.

The benefits are real: a complete audit trail (every change is a recorded fact — gold for finance and compliance), the ability to reconstruct state at any past point by replaying to an offset, and multiple specialized read models each rebuildable from the same log (compaction — §11 — provides the snapshot to avoid replaying from the dawn of time).

Powerful, and not free — don't reach for it by default

Event sourcing adds real complexity: schema evolution of historical events is hard (you can never change the past, only how you interpret it), rebuilding state has a cost, and "what's the current value?" now requires a projection rather than a row read. It shines for audit-heavy, complex domains (payments, ledgers — squarely fintech) where the history is the value. For ordinary CRUD, a normal database is simpler and correct — use the pattern where the audit trail and replay genuinely pay for the complexity, not as a default architecture.

15

The Outbox Pattern & Change Data Capture

A subtle, important problem in event-driven systems: when a service must both update its database and publish an event, doing them as two separate operations is unsafe. If the DB commit succeeds but the Kafka publish fails (or vice versa), your systems diverge — the order exists but no "OrderPlaced" event fired, or an event fired for an order that rolled back. This is the dual-write problem, and you can't fix it by reordering the two writes — there's always a crash window between them.

The dual-write problem & the outbox fix
Write the event into the same DB transaction; relay it to Kafka afterward
naive dual write — can diverge commit DB ⚡ crash here publish Kafka DB updated but event never sent outbox pattern — atomic ONE DB transaction orders row outbox row relay / CDC Kafka topic Both rows commit together or not at all; a separate relay reliably forwards the outbox — no lost or phantom events.

The outbox pattern solves it with a single local transaction: write your business change and an "outbox" row describing the event into the same database transaction — so they commit atomically. A separate relay process then reads new outbox rows and publishes them to Kafka (retrying until success, at-least-once — hence consumers stay idempotent, §9). The modern, low-friction way to run that relay is Change Data Capture (CDC): a tool like Debezium tails the database's transaction log and streams committed changes straight into Kafka — no application polling required.

outbox: business write + event in one transaction
BEGIN;

-- 1) the actual business change
INSERT INTO orders (id, customer_id, amount, status)
VALUES ('order-42', 'cust-7', 1999, 'PLACED');

-- 2) the event to publish, written in the SAME transaction → atomic with (1)
INSERT INTO outbox (id, aggregate, event_type, payload, created_at)
VALUES (gen_random_uuid(), 'order-42', 'OrderPlaced',
        '{"id":"order-42","amount":1999}', now());

COMMIT;   -- both succeed together, or neither does — no dual-write gap

-- A relay (or Debezium CDC on the transaction log) then reads new `outbox`
-- rows and produces them to Kafka, marking each published. Retries are safe
-- because downstream consumers are idempotent (§9).
The standard answer to "DB + Kafka together"

Whenever a write must atomically produce an event — ubiquitous in payments and order systems — reach for the outbox (often via Debezium CDC). It converts an impossible distributed-transaction problem into an ordinary local transaction plus a reliable, idempotent relay. This is one of the most practically valuable patterns in the whole event-driven toolkit.

Part V · Processing, Ops & Practice
16

Stream Processing

So far consumers read events one at a time. Stream processing is the discipline of computing continuously over the stream — filtering, transforming, aggregating, joining — producing new derived streams. Instead of "read event, do a thing," you express a standing query like "a 5-minute rolling count of failed payments per merchant" that updates forever as events arrive. Kafka's own library for this is Kafka Streams (JVM); ksqlDB offers a SQL-like layer, and Apache Flink is the heavyweight for large, complex jobs.

ConceptMeaning
KStreamAn unbounded stream of independent events (an append log view) — every record matters.
KTableA changelog interpreted as current state per key — the compacted-topic view (§11); latest value wins.
Stateful opsAggregations/joins that remember across events, backed by a local state store (itself a compacted Kafka topic for fault tolerance).
WindowingBucketing events by time (tumbling, hopping, session) to aggregate "per 5 minutes," etc.
Event vs processing timeWhen the event happened vs when you process it — they differ, and late/out-of-order data must be handled.
The stream/table duality

The elegant core idea: a stream and a table are two views of the same data. A stream of changes aggregated by key gives a table (current state); a table's changelog is a stream of updates. This is exactly the retention-vs-compaction distinction (§11) raised to a programming model — and it's why a compacted topic is a table and a normal topic is a stream. Most backend engineers consume streams with the plain consumer (§7); reach for a stream- processing framework when you need stateful aggregation, joins, or windowing that would be painful to hand-roll.

17

Kafka vs Traditional Queues

Returning to the §2 distinction with enough context to decide well. This is the question that sends people to the task-queues chapter and back: do I want Kafka or a message queue (RabbitMQ, SQS, the Celery/Sidekiq world)? They overlap enough to confuse and differ enough to matter.

DimensionKafka (log)RabbitMQ / SQS (queue)
Core modelDurable, replayable event logTransient work queue
After consumptionRetained; replayable by any groupDeleted once acked
Fan-out to many systemsNative — each group reads allNeeds exchanges/fan-out config; not the default
ThroughputVery high (millions/sec)High, but typically lower
OrderingStrong per partitionWeak; FIFO modes are limited/slower
Per-message routing/priority/TTLLimited — not its modelRich (routing keys, priority, delay, DLQ)
Replay / time-travelYes — seek to any offsetNo
Best forEvent streaming, many subscribers, replay, analytics, event sourcingTask offloading, RPC-ish jobs, complex routing, per-message control
The decision heuristic

Reach for Kafka when events are facts many systems care about, when you need high throughput, ordering, or the ability to replay history — the streaming backbone. Reach for a queue (ch.10) when you're handing off discrete units of work to be done once, especially with per-message routing, priorities, delays, or dead-letter handling. They're complementary: a great many real systems run Kafka as the event spine and a task queue for background jobs. Don't force task-queue semantics onto Kafka (per-message TTL, priority, selective ack) — that's swimming against its design.

18

Consumer Pitfalls

Most production Kafka pain shows up on the consumer side. These are the failure modes that bite real teams — worth recognizing before they page you at 2am.

PitfallWhat happensMitigation
Rebalancing stormsFrequent group rebalances (slow processing trips the poll timeout, member declared dead) stall consumption repeatedlyKeep per-poll work bounded, tune max.poll.records / max.poll.interval.ms; offload slow work; use cooperative/incremental rebalancing
Poison pillOne un-processable message (bad schema, bad data) makes the consumer crash & retry forever — the partition is stuckCatch per-record errors; route the bad record to a dead-letter topic and move on
Consumer lagProducers outpace consumers; lag grows unbounded; data gets staleAdd partitions + consumers (§7), speed up processing, batch; alert on lag (§19)
Partition / key skewA hot key (one giant tenant) overloads its single partition while others idle (§10)Choose a higher-cardinality key, or a composite key; isolate hot tenants
Slow processing blocks orderTo keep order you process a partition serially, so one slow record delays the restParallelize across partitions, not within; consider per-key parallelism patterns
Reprocessing on commit gapsA crash between processing and offset commit re-delivers recordsExpected under at-least-once — make processing idempotent (§9)
Always have a dead-letter strategy

A single malformed message must never be able to halt a whole partition indefinitely — that's a poison pill, and it's a common outage cause. Wrap per-record processing in error handling: on a persistent failure, publish the offending record (plus the error and metadata) to a dead-letter topic, commit past it, and keep the stream flowing. You investigate the dead-letter topic later instead of letting one bad record block everyone — the same DLQ discipline the task-queues chapter applies to failed jobs.

19

Operations & Monitoring

Running Kafka well comes down to watching a few signals and sizing a few things correctly. You don't need to memorize every metric, but you must know the ones that predict trouble.

WatchWhy it matters
Consumer lagThe headline health metric (§8): growing lag means consumers can't keep up — data is getting stale. Alert on it.
Under-replicated partitionsReplicas falling out of the ISR (§5) — durability is at risk; a broker may be struggling.
Broker disk & throughputRetention (§11) means topics consume real disk; saturated disk or network throttles the cluster.
Rebalance frequencyFrequent rebalances (§18) indicate unstable consumers and cause stalls.
Request latency & errorsProduce/fetch latency and error rates surface broker or client problems early.

Sizing decisions you actually make

  • Partition count — sets max consumer parallelism and throughput (§4, §7). Estimate from target throughput and peak consumer count; err slightly high, but not wildly (each partition has overhead, and thousands per broker hurt). Hard to change later, so think ahead (§20).
  • Replication factor — usually 3 in production: tolerates one broker loss with room to spare. Pair with min.insync.replicas=2 and acks=all (§5).
  • Retention — how long events live (and thus disk usage and replay window). Set per topic by how far back consumers might need to read.
Tooling

Standard observability: export broker and client JMX metrics to Prometheus + Grafana (the same stack the task-queues and logging chapters use), and watch consumer lag specifically with a lag exporter or a UI like Kafka UI / Conduktor / AKHQ. kafka-consumer-groups.sh --describe gives a quick CLI read of a group's lag per partition when you're debugging live.

20

Designing with Kafka

Pulling the manual into design judgment. Kafka is a powerful, sharp tool; used where it fits it's transformative, used as a generic message bus it's a liability. A few principles separate good Kafka designs from painful ones.

Design principles

  • Model topics around events/facts, not commands or RPC. A topic is a stream of "things that happened" (PaymentSettled), consumed by whoever cares — not a way to tell a specific service what to do. If you find yourself wanting request/response, you want gRPC/REST, not Kafka.
  • Choose the partition key by your ordering boundary. Key by the entity whose events must stay ordered (account, order, user) — but at a granularity that still spreads load, to avoid hot partitions (§10, §18). The key is the single most consequential per-topic decision.
  • Assume at-least-once; build idempotent consumers. Don't chase end-to-end exactly-once across external systems (§9). Dedupe by event ID or use upserts so duplicates are harmless — cheaper and more robust.
  • Govern schemas from day one. Use the Schema Registry with backward compatibility (§12) so independent teams can evolve safely. A topic's schema is a contract.
  • Use the outbox for DB-and-event writes. Never dual-write; make the event atomic with the data and relay it (§15).
  • Size partitions for peak up front. Repartitioning breaks key ordering (§4), so plan for growth rather than reacting to it.
When Kafka, and when not

Reach for Kafka when you have a stream of events many systems consume, need high throughput or replay, are doing event-driven architecture, event sourcing, analytics pipelines, or log/metric ingestion. Don't reach for it for simple request/response (use gRPC/REST — the gRPC chapter), for discrete background jobs with per-message routing or priority (use a task queue — ch.10), or as your only datastore for point lookups (use a database). Matching the tool to the shape of the problem is the whole skill.

21

Cheat-Sheet

The whole manual compressed to what you reach for under pressure.

ConceptOne-liner
KafkaA distributed, durable, append-only commit log of events that many consumers read at their own offset.
Queue vs logQueue: consumed once & deleted. Log: retained, replayable, every group reads all.
Commit logAppend-only, immutable, offset-addressed — sequential I/O, the basis of everything.
TopicA named stream, split into partitions.
PartitionUnit of parallelism, ordering, and storage. Same key → same partition.
Broker / replicationCluster of servers; each partition has a leader + follower replicas (ISR).
acks / ISRacks=all + min.insync.replicas=2 = no data loss on one broker failure.
ProducerAppends keyed records; key sets partition; idempotence avoids dup on retry.
Consumer groupSplits partitions among members. Same group = share work; different groups = broadcast.
Parallelism capMax useful consumers per group = partition count.
Offset / lagYour position per partition; lag = tail − position = the key health metric.
Commit timingCommit after processing = at-least-once; before = at-most-once.
DeliveryAt-most / at-least / exactly-once. Pick at-least-once + idempotent consumers.
OrderingGuaranteed within a partition only. Key by your ordering boundary.
RetentionDelete by time/size, or compact to keep latest value per key.
CompactionLatest-per-key → a durable, replayable snapshot of current state (a table).
Schema RegistryVersioned schemas (Avro/Protobuf) with enforced backward compatibility.
EDAEmit events; consumers react independently — loose coupling, easy to extend.
Event sourcing / CQRSLog is the source of truth; read models are rebuildable projections.
Outbox / CDCWrite data + event in one DB txn; relay (Debezium) to Kafka — no dual-write gap.
Stream processingContinuous compute (filter/aggregate/join/window); KStream vs KTable duality.
PitfallsRebalance storms, poison pills (use a DLQ), lag, key skew.
vs queuesKafka = event streaming/replay/throughput; queues = work + routing/priority. Use both.

The whole topic in one breath: Kafka is a distributed, durable, append-only commit log (§1–3) — unlike a queue, events are retained and every consumer group reads all of them and can replay (§2). A topic is split into partitions, the unit of parallelism, ordering, and storage (§4), spread across brokers and replicated for durability via leaders, followers, and the ISR (§5). Producers append keyed records (§6); consumer groups divide partitions to scale out, with parallelism capped by partition count (§7). Each consumer tracks its offset, and the lag to the tail is the key health signal (§8); when you commit decides your delivery semantics, and the pragmatic choice is at-least-once with idempotent consumers (§9). Ordering holds only within a partition, so you key by your ordering boundary and watch for skew (§10); retention deletes by age while compaction keeps the latest value per key (§11). A Schema Registry governs payloads across teams (§12); the log underpins event-driven architecture (§13), event sourcing and CQRS (§14), and the outbox/CDC pattern for atomic DB-and-event writes (§15). Stream processing computes continuously over it (§16); choose Kafka over a queue for streaming, replay, and throughput (§17), guard against rebalance storms, poison pills, lag, and skew (§18), monitor lag and ISR (§19), and design topics around events keyed sensibly, with idempotent consumers and governed schemas (§20).

Grounded in the Apache Kafka & Confluent docs · the log/stream-table literature (Kreps) · Debezium for CDC · Go 1.22+ (confluent-kafka-go) / Python 3.11+ (confluent-kafka) examples.