A detailed backend reference

Kafka, the log
at the heart of your system.

A first-principles walkthrough of the distributed event-streaming platform that backs the nervous systems of LinkedIn, Uber, Netflix and most modern fintech — from the one idea everything rests on (an append-only commit log), through partitions, consumer groups, offsets and delivery semantics, into retention, schemas, event-driven patterns, the outbox/CDC trick, and stream processing. Written to explain not just what each piece does but why it exists and how it works underneath. Producer/consumer code in both Go and Python.

Distributed commit log Partitions & consumer groups Go 1.22+ · Python 3.11+ 21 sections

Part I · The Log Model

What Kafka Actually Is

Strip away the marketing and Kafka is one thing: a distributed, append-only, durable log of events that many independent consumers can read at their own pace. Not a queue, not a database, not a message bus in the classic sense — a log. Producers append records to the end; consumers read forward from a position they track themselves; and the records stay after being read, for a configured retention. Once that single picture is in your head, every feature in this manual is a consequence of it.

The official description is "a distributed event streaming platform," and each word earns its place. Distributed: it runs as a cluster of brokers, replicating data for fault tolerance and scaling horizontally. Event streaming: it's built around an unbounded, continuous flow of immutable events (something happened: an order was placed, a payment cleared, a sensor read), not request/response calls. Platform: producers, consumers, a connector ecosystem, and stream processing all sit on the same log.

A traditional message queue is a to-do list: you take an item, do it, and it's gone. Kafka is a ship's logbook: every event is written down in order and stays in the book. Ten different officers can each read the log from wherever they left off, re-read yesterday's entries, or start from the beginning — and reading never erases a line.

The shape of Kafka

Producers append; the log persists; many consumers read independently

This decoupling — durable retained events plus per-consumer positions — is the whole game. It's what lets one stream of events feed many unrelated systems independently.

The one definition

Kafka is a distributed, durable, append-only commit log of immutable events, partitioned for scale and replicated for fault tolerance, that multiple independent consumers read at their own offset — with events retained after reading. Everything else is detail on top of that.

Queue vs Log — Why Kafka Is Different

This is the most important section in the manual, because conflating Kafka with a traditional message queue is the root of most confusion. The task-queues chapter covered the queue model in depth — producers, workers, brokers like RabbitMQ and SQS, retries, visibility timeouts. Kafka is a different shape of thing, and the difference is not a detail; it changes what problems it solves.

The fundamental contrast

A queue distributes & deletes; a log retains & replays

Dimension	Traditional queue (ch.10)	Kafka log
After a message is read	Deleted (consumed once)	Retained (until retention expires)
Who receives a message	One competing consumer	Every consumer group, independently
Replay / re-read	No — it's gone	Yes — seek to any offset
Consumer position	Broker tracks & removes	Consumer tracks its own offset
Ordering	Weak/none across consumers	Strict within a partition
Mental model	Work distribution (to-do list)	Event history / source of truth (logbook)
Sweet spot	Task offloading, job processing	Event streaming, many subscribers, replay, high throughput

They coexist — this isn't either/or

You'll often use both: Kafka as the durable event backbone (orders, payments, clicks streamed to many systems), and a task queue (Celery/SQS — ch.10) for discrete background jobs (send this email, resize that image). The full comparison and "which when" is §17. For now hold the distinction: queue = distribute work and forget; log = record events and let many systems consume & replay them.

The Commit Log

Everything Kafka does rests on the humblest data structure in computing: the append-only log — an ordered sequence of records you can only add to the end of, never insert into the middle of or modify. Each record gets a monotonically increasing offset (0, 1, 2, …) that is its permanent address. This is the same structure a database uses for its write-ahead log and durability; Kafka's insight was to make the log itself the primary, public abstraction rather than a hidden internal detail.

Append-only, offset-addressed

Writes go to the tail; the offset is a permanent address

Why an append-only log is so fast and so useful

Sequential I/O is cheap. Appending to the end of a file and reading it forward are sequential disk operations — dramatically faster than random access, even on SSDs. Kafka leans on the OS page cache and zero-copy transfer to move data from disk to network without passing through application memory, which is how it sustains millions of messages per second on modest hardware.
Immutability is simple & safe. Records never change, so there are no locks for updates, no read/write conflicts, and concurrent readers never interfere. An immutable history is also trivially cacheable and easy to reason about.
The log is the history. Because nothing is overwritten, the log is a complete, ordered record of everything that happened — which is exactly what makes replay (§11), event sourcing (§14), and feeding new consumers from the beginning possible.

One structure, everything follows

Hold the append-only log firmly: writes append and get an offset; reads move forward; records are immutable and retained. Partitions (§4) are just multiple such logs; consumer groups (§7) are bookmarks into them; delivery semantics (§9) are about how carefully you track those bookmarks. The whole system is the log idea, scaled and replicated.

Part II · Architecture

Topics & Partitions

A topic is a named stream of related events — orders, payments, page-views. It's the logical channel producers write to and consumers read from. But a topic is not a single log; it's split into partitions, and the partition is where all the real mechanics live. The partition is simultaneously Kafka's unit of parallelism, of ordering, and of storage — three of the most important properties in the system, all riding on this one concept.

A topic is partitioned

One topic, several independent append-only logs; a key picks the partition

The three roles of a partition

Parallelism. Partitions can live on different brokers and be consumed by different consumers at once, so a topic's throughput scales with its partition count. More partitions → more parallel consumption (up to the consumer-group limit — §7).
Ordering. Kafka guarantees order within a single partition and makes no promise across partitions. This is the central trade-off (§10): to keep related events ordered, you route them to the same partition via a shared key (all events for order-42 use key "order-42", so they're strictly ordered relative to each other).
Storage. Each partition is physically a set of segment files on a broker's disk — the actual append-only log from §3. Retention and compaction (§11) operate per partition.

Partition count is a one-way-ish decision

You can increase a topic's partitions later, but doing so breaks key-to-partition stability (the modulo changes, so existing keys may move) and thus breaks ordering guarantees for in-flight keys. And you can't easily decrease them. So choose partition count deliberately up front based on target throughput and the maximum consumer parallelism you'll need — too few caps your scaling, too many adds overhead (§19, §20).

Brokers, Replication & Durability

A Kafka cluster is a set of brokers (servers); partitions are spread across them, and that's how the cluster scales and survives failure. Durability comes from replication: each partition has one leader replica and some follower replicas on other brokers. All reads and writes go to the leader; followers continuously copy from it. If the leader's broker dies, a follower is promoted — no data lost, brief unavailability. This is the same desired-state/failover thinking as the Kubernetes chapter, applied to data.

Replication

Each partition has a leader and in-sync followers on other brokers

The durability dials: acks & ISR

How safe a write is depends on two settings. The producer's acks says how many replicas must confirm before the write is considered done: acks=0 (fire-and-forget, may lose data), acks=1 (leader only — lost if the leader dies before followers copy it), or acks=all (the leader plus all in-sync replicas — the safe choice). The ISR (in-sync replica set) is the followers currently caught up with the leader; pairing acks=all with min.insync.replicas=2 means a write only succeeds if at least two replicas have it — surviving a single broker loss with zero data loss.

Setting	Guarantee	Trade-off
`acks=0`	None — fire and forget	Fastest; can silently lose data
`acks=1`	Leader has it	Fast; lost if leader fails pre-replication
`acks=all` + `min.insync.replicas=2`	≥2 replicas have it before success	Safest; slightly higher latency — the standard for important data

ZooKeeper is gone — KRaft

Older Kafka relied on a separate ZooKeeper cluster to store metadata and elect leaders. Modern Kafka uses KRaft (Kafka Raft), folding that role into the brokers themselves via a Raft-based controller quorum — fewer moving parts, faster failover, simpler ops. If you read older docs mentioning ZooKeeper, KRaft is its replacement.

Producers

A producer publishes records to a topic. Three producer concerns shape correctness and performance: the key (which decides the partition and thus ordering — §4), the acks (durability — §5), and batching (the producer groups records and compresses them for throughput). A record is a key/value pair plus optional headers and a timestamp; the value is your event payload (often Avro/Protobuf — §12).

// Using confluent-kafka-go (librdkafka). kafka-go (segmentio) is a pure-Go alternative.
p, _ := kafka.NewProducer(&kafka.ConfigMap{
    "bootstrap.servers": "localhost:9092",
    "acks":              "all",   // durability: leader + in-sync replicas (§5)
    "enable.idempotence": true,   // no duplicates on producer retry (§9)
    "linger.ms":         5,       // wait up to 5ms to batch records (throughput)
    "compression.type":  "zstd",
})
defer p.Close()

topic := "orders"
// The KEY decides the partition → all events for one order stay ordered (§4, §10).
err := p.Produce(&kafka.Message{
    TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
    Key:            []byte("order-42"),
    Value:          []byte(`{"event":"placed","amount":1999}`),
}, nil)
if err != nil { log.Fatal(err) }

// Producing is async + batched; Flush blocks until queued messages are delivered.
p.Flush(5000)

// Delivery reports tell you the final partition/offset (or an error) per message:
//   for e := range p.Events() { if m, ok := e.(*kafka.Message); ok { ... } }

# Using confluent-kafka (librdkafka). kafka-python / aiokafka are alternatives.
from confluent_kafka import Producer

p = Producer({
    "bootstrap.servers": "localhost:9092",
    "acks": "all",                 # durability: leader + in-sync replicas (§5)
    "enable.idempotence": True,    # no duplicates on producer retry (§9)
    "linger.ms": 5,                # batch window for throughput
    "compression.type": "zstd",
})

def on_delivery(err, msg):         # async callback: final partition/offset or error
    if err:
        print("delivery failed:", err)
    else:
        print(f"delivered to {msg.topic()}[{msg.partition()}]@{msg.offset()}")

# The KEY decides the partition → all events for one order stay ordered (§4, §10).
p.produce(
    "orders",
    key=b"order-42",
    value=b'{"event":"placed","amount":1999}',
    on_delivery=on_delivery,
)
p.flush()                          # block until queued messages are delivered

No key = round-robin = no ordering

If you don't set a key, records are spread across partitions (round-robin / sticky), maximizing parallelism but giving up any ordering relationship between them. That's fine for independent events; it's a bug if those events needed to stay ordered (all updates to one account, say). Key by your ordering/entity boundary — usually the entity ID — whenever order matters (§10).

Consumers & Consumer Groups

A consumer reads records from partitions. The powerful abstraction is the consumer group: a set of consumers sharing a group ID that cooperatively divide a topic's partitions among themselves so each partition is read by exactly one consumer in the group. Add consumers to the group and Kafka rebalances, reassigning partitions to spread the load. This is how you scale consumption horizontally — and it's the mechanism behind both work-sharing and broadcast.

Consumer groups

Within a group, partitions are split; different groups each get everything

c, _ := kafka.NewConsumer(&kafka.ConfigMap{
    "bootstrap.servers": "localhost:9092",
    "group.id":          "billing",        // the consumer GROUP — scale by adding instances
    "auto.offset.reset": "earliest",       // where to start if no committed offset (§8)
    "enable.auto.commit": false,           // we'll commit manually for at-least-once (§9)
})
defer c.Close()
c.Subscribe("orders", nil)                 // Kafka assigns this instance some partitions

for {
    msg, err := c.ReadMessage(-1)          // blocks for the next record
    if err != nil { continue }

    process(msg.Value)                     // do the work FIRST...
    c.CommitMessage(msg)                   // ...THEN commit the offset (at-least-once, §9)
}

from confluent_kafka import Consumer

c = Consumer({
    "bootstrap.servers": "localhost:9092",
    "group.id": "billing",                 # the consumer GROUP — scale by adding instances
    "auto.offset.reset": "earliest",       # where to start if no committed offset (§8)
    "enable.auto.commit": False,           # commit manually for at-least-once (§9)
})
c.subscribe(["orders"])                    # Kafka assigns this instance some partitions

try:
    while True:
        msg = c.poll(1.0)                  # wait up to 1s for the next record
        if msg is None:
            continue
        if msg.error():
            print("error:", msg.error()); continue

        process(msg.value())               # do the work FIRST...
        c.commit(msg)                      # ...THEN commit the offset (at-least-once, §9)
finally:
    c.close()                              # leave the group cleanly → triggers a rebalance

Partitions cap your parallelism

Because one partition is consumed by at most one consumer per group, a group can usefully have at most as many active consumers as the topic has partitions. A topic with 4 partitions and 6 group members leaves 2 consumers idle. So your maximum consumption parallelism is set by partition count (§4) — another reason to size partitions for your peak throughput up front (§20).

Part III · Guarantees

Offsets & Offset Management

An offset is a record's position in its partition (§3). The consumer's job is to track which offset it has processed up to — its committed offset — so that on restart or rebalance it resumes from the right place instead of re-reading everything or skipping records. Kafka stores these committed offsets in an internal topic (__consumer_offsets), keyed by group + partition. The gap between the latest offset in a partition and a group's committed offset is consumer lag — the single most important health metric for a consumer (§19).

Position & lag

Committed offset, current position, and the lag to the tail

When you commit decides your guarantee

The choice between auto-commit (Kafka periodically commits in the background) and manual commit (you commit explicitly) is really a choice about delivery semantics (§9). The ordering of "process the message" versus "commit the offset" is everything: commit after processing and a crash in between means you'll re-process (at-least-once); commit before processing and a crash means you'll skip (at-most-once). Auto-commit on a timer can do either depending on timing, which is why correctness-sensitive consumers commit manually after the work is done (as in §7's code).

Offsets are per group, and you can seek

Because each group has its own committed offsets, a brand-new group can start from earliest and replay the entire retained history — the basis for bootstrapping a new consumer or rebuilding a view (§14). You can also explicitly seek a consumer to any offset or timestamp to reprocess a range. The log's retained immutability (§3) is what makes this possible; a queue can't do it.

Delivery Semantics

The question every distributed messaging system must answer: if things fail and retry, how many times might a message be delivered? There are three possible guarantees, and Kafka can give you any of them depending on configuration — so you must choose consciously based on what your processing can tolerate.

Three semantics

The trade-off between losing messages and duplicating them

At-least-once + idempotent consumers = the practical answer

True end-to-end exactly-once is narrow: Kafka's enable.idempotence stops producer duplicates, and transactions give exactly-once for Kafka-to-Kafka stream processing (consume, transform, produce, and commit offsets atomically). But the moment your consumer writes to an external system (a database, a payment API), exactly-once across that boundary isn't free. The battle-tested approach: choose at-least-once and make your processing idempotent — safe to apply the same event twice (dedupe by event ID, upsert instead of insert). This is the same idempotency discipline as the task-queues chapter (§14 there) and REST idempotency keys, and it's why duplicates become harmless rather than catastrophic.

making at-least-once safe: idempotent processing (pseudocode, both languages)

-- The consumer may see the same event twice on retry/rebalance.
-- Make the write idempotent so a duplicate is a no-op:

-- (a) dedupe by a unique event id
INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING;        -- second delivery inserts nothing
-- ...only do the side effect if the insert affected a row.

-- (b) or make the effect itself idempotent — upsert the end state, don't increment
INSERT INTO account_balance (account_id, balance) VALUES ($1, $2)
ON CONFLICT (account_id) DO UPDATE SET balance = EXCLUDED.balance;
-- applying the same "balance = 100" twice lands on the same state.

Ordering Guarantees

One sentence to memorize: Kafka guarantees ordering within a partition, and nowhere else. Records in partition P are delivered to a consumer in the exact offset order they were written. Across partitions of the same topic, there is no global ordering — records interleave arbitrarily. This isn't a limitation to work around so much as the direct price of partition-level parallelism (§4): you can have total ordering, or you can have horizontal scale, but not both at once for the same stream.

Ordering vs parallelism

Keying preserves per-entity order; the trade-off is partition skew

Two ordering foot-guns

First, key skew: if order requires keying by a coarse value (e.g. one mega-tenant), all that tenant's traffic funnels into one partition, capping its throughput regardless of how many partitions exist (§18). Second, retries can reorder: with a non-idempotent producer and multiple in-flight requests, a retried message can land after a later one. Enabling enable.idempotence (§6) preserves per-partition order even across retries — another reason it's a sensible default.

Retention & Log Compaction

Since records are retained after reading (§2), Kafka needs rules for when to remove them — otherwise the log grows forever. There are two cleanup policies, and they answer different needs. Time/size retention (the default) keeps records for a window — say 7 days or 50 GB per partition — then deletes the oldest. Log compaction is cleverer: it keeps the latest record for each key and garbage-collects older versions of that key, turning the log into a continuously-updated snapshot of "the current value per key."

Retention vs compaction

Delete by age, or keep the latest value per key

Compaction powers state & the replay story

A compacted topic is how Kafka holds current state forever without unbounded growth — the basis for event-sourcing snapshots (§14), Kafka Streams' state stores and KTables (§16), and config/lookup data that a new consumer can replay from scratch to rebuild a complete picture. Time-retention topics answer "what happened recently"; compacted topics answer "what is the latest state of everything." Pick the policy by which question your topic exists to answer.

Part IV · Schemas & Patterns

Schemas & the Schema Registry

Kafka itself treats record values as opaque bytes — it neither knows nor cares what's inside. That freedom is a trap at scale: if producers and consumers don't agree on the payload's shape, a producer changing a field silently breaks every downstream consumer (the same contract-drift danger the gRPC and serialization chapters warned about, now spread across many independent teams). The fix is a schema for each topic plus a Schema Registry that stores schemas, assigns them IDs, and enforces compatibility as they evolve.

Schema Registry

Producers and consumers share a registered, versioned schema

Formats & compatibility

The common serialization formats are Avro (the traditional Kafka pairing, compact with a rich schema-evolution story), Protobuf (the same IDL as the gRPC chapter, increasingly popular), and JSON Schema (readable, larger). The registry enforces a compatibility mode — typically backward compatibility, meaning new consumers can read old data: you may add optional fields and remove ones with defaults, but you may not rename or retype a field. This is precisely the additive-change discipline from the gRPC chapter (§04 there: field numbers, never reuse) and REST versioning, enforced automatically by a server.

A schema is a cross-team contract

In a microservice fleet, a topic's schema is a public API that many independent teams depend on. The Schema Registry turns "please don't break the payload" from a tribal convention into a machine-checked gate — the same role buf plays for gRPC .proto files. Treat schema changes with the same care as any API change: additive and compatible by default.

Event-Driven Patterns

Kafka is the substrate for event-driven architecture (EDA), where services communicate by emitting and reacting to events rather than calling each other directly. Instead of the order service commanding the email, inventory, and analytics services (tight coupling, the order service must know them all), it simply announces "OrderPlaced" to a topic, and any interested service subscribes. The producer doesn't know — or care — who consumes. This is the deep payoff of the log's decoupling (§1): you add a new consumer without touching the producer.

Choreography

One event, many independent reactions — the producer is unaware

Pattern	What the event carries	Trade-off
Event notification	Just "X happened" + an ID	Tiny messages; consumers must call back for details (coupling, load)
Event-carried state transfer	The full relevant state in the event	Consumers are self-sufficient (no callbacks); larger messages, possible duplication
Choreography	Services react to events autonomously	Loose coupling, easy to extend; flow is harder to see end-to-end
Orchestration	A coordinator drives the steps (a saga)	Explicit, visible flow; reintroduces a central coordinator

Decoupling isn't free

EDA trades the clarity of a direct call for flexibility. The downside is that no single place describes the whole workflow — "what happens when an order is placed?" is now spread across many subscribers, which makes debugging and reasoning harder. Mitigate with good observability (trace/correlation IDs through event headers, the same idea as §13/§15 of the layers and logging chapters), clear event schemas (§12), and choosing orchestration (an explicit saga) when a flow genuinely needs a visible, coordinated sequence.

Event Sourcing & CQRS

Two advanced patterns the log enables, often confused, frequently paired. Event sourcing stores state as the sequence of events that produced it rather than just the current snapshot: instead of a row balance = 100, you keep Deposited 60, Deposited 50, Withdrew 10 — and derive the balance by replaying them. Kafka's immutable, ordered, retained log (§3) is a natural event store. CQRS (Command Query Responsibility Segregation) separates the write model from one or more read models optimized for querying.

Event sourcing + CQRS

The event log is the source of truth; read models are derived projections

The benefits are real: a complete audit trail (every change is a recorded fact — gold for finance and compliance), the ability to reconstruct state at any past point by replaying to an offset, and multiple specialized read models each rebuildable from the same log (compaction — §11 — provides the snapshot to avoid replaying from the dawn of time).

Powerful, and not free — don't reach for it by default

Event sourcing adds real complexity: schema evolution of historical events is hard (you can never change the past, only how you interpret it), rebuilding state has a cost, and "what's the current value?" now requires a projection rather than a row read. It shines for audit-heavy, complex domains (payments, ledgers — squarely fintech) where the history is the value. For ordinary CRUD, a normal database is simpler and correct — use the pattern where the audit trail and replay genuinely pay for the complexity, not as a default architecture.

The Outbox Pattern & Change Data Capture

A subtle, important problem in event-driven systems: when a service must both update its database and publish an event, doing them as two separate operations is unsafe. If the DB commit succeeds but the Kafka publish fails (or vice versa), your systems diverge — the order exists but no "OrderPlaced" event fired, or an event fired for an order that rolled back. This is the dual-write problem, and you can't fix it by reordering the two writes — there's always a crash window between them.

The dual-write problem & the outbox fix

Write the event into the same DB transaction; relay it to Kafka afterward

The outbox pattern solves it with a single local transaction: write your business change and an "outbox" row describing the event into the same database transaction — so they commit atomically. A separate relay process then reads new outbox rows and publishes them to Kafka (retrying until success, at-least-once — hence consumers stay idempotent, §9). The modern, low-friction way to run that relay is Change Data Capture (CDC): a tool like Debezium tails the database's transaction log and streams committed changes straight into Kafka — no application polling required.

outbox: business write + event in one transaction

BEGIN;

-- 1) the actual business change
INSERT INTO orders (id, customer_id, amount, status)
VALUES ('order-42', 'cust-7', 1999, 'PLACED');

-- 2) the event to publish, written in the SAME transaction → atomic with (1)
INSERT INTO outbox (id, aggregate, event_type, payload, created_at)
VALUES (gen_random_uuid(), 'order-42', 'OrderPlaced',
        '{"id":"order-42","amount":1999}', now());

COMMIT;   -- both succeed together, or neither does — no dual-write gap

-- A relay (or Debezium CDC on the transaction log) then reads new `outbox`
-- rows and produces them to Kafka, marking each published. Retries are safe
-- because downstream consumers are idempotent (§9).

The standard answer to "DB + Kafka together"

Whenever a write must atomically produce an event — ubiquitous in payments and order systems — reach for the outbox (often via Debezium CDC). It converts an impossible distributed-transaction problem into an ordinary local transaction plus a reliable, idempotent relay. This is one of the most practically valuable patterns in the whole event-driven toolkit.

Part V · Processing, Ops & Practice

Stream Processing

So far consumers read events one at a time. Stream processing is the discipline of computing continuously over the stream — filtering, transforming, aggregating, joining — producing new derived streams. Instead of "read event, do a thing," you express a standing query like "a 5-minute rolling count of failed payments per merchant" that updates forever as events arrive. Kafka's own library for this is Kafka Streams (JVM); ksqlDB offers a SQL-like layer, and Apache Flink is the heavyweight for large, complex jobs.

Concept	Meaning
KStream	An unbounded stream of independent events (an append log view) — every record matters.
KTable	A changelog interpreted as current state per key — the compacted-topic view (§11); latest value wins.
Stateful ops	Aggregations/joins that remember across events, backed by a local state store (itself a compacted Kafka topic for fault tolerance).
Windowing	Bucketing events by time (tumbling, hopping, session) to aggregate "per 5 minutes," etc.
Event vs processing time	When the event happened vs when you process it — they differ, and late/out-of-order data must be handled.

The stream/table duality

The elegant core idea: a stream and a table are two views of the same data. A stream of changes aggregated by key gives a table (current state); a table's changelog is a stream of updates. This is exactly the retention-vs-compaction distinction (§11) raised to a programming model — and it's why a compacted topic is a table and a normal topic is a stream. Most backend engineers consume streams with the plain consumer (§7); reach for a stream- processing framework when you need stateful aggregation, joins, or windowing that would be painful to hand-roll.

Kafka vs Traditional Queues

Returning to the §2 distinction with enough context to decide well. This is the question that sends people to the task-queues chapter and back: do I want Kafka or a message queue (RabbitMQ, SQS, the Celery/Sidekiq world)? They overlap enough to confuse and differ enough to matter.

Dimension	Kafka (log)	RabbitMQ / SQS (queue)
Core model	Durable, replayable event log	Transient work queue
After consumption	Retained; replayable by any group	Deleted once acked
Fan-out to many systems	Native — each group reads all	Needs exchanges/fan-out config; not the default
Throughput	Very high (millions/sec)	High, but typically lower
Ordering	Strong per partition	Weak; FIFO modes are limited/slower
Per-message routing/priority/TTL	Limited — not its model	Rich (routing keys, priority, delay, DLQ)
Replay / time-travel	Yes — seek to any offset	No
Best for	Event streaming, many subscribers, replay, analytics, event sourcing	Task offloading, RPC-ish jobs, complex routing, per-message control

The decision heuristic

Reach for Kafka when events are facts many systems care about, when you need high throughput, ordering, or the ability to replay history — the streaming backbone. Reach for a queue (ch.10) when you're handing off discrete units of work to be done once, especially with per-message routing, priorities, delays, or dead-letter handling. They're complementary: a great many real systems run Kafka as the event spine and a task queue for background jobs. Don't force task-queue semantics onto Kafka (per-message TTL, priority, selective ack) — that's swimming against its design.

Consumer Pitfalls

Most production Kafka pain shows up on the consumer side. These are the failure modes that bite real teams — worth recognizing before they page you at 2am.

Pitfall	What happens	Mitigation
Rebalancing storms	Frequent group rebalances (slow processing trips the poll timeout, member declared dead) stall consumption repeatedly	Keep per-poll work bounded, tune `max.poll.records` / `max.poll.interval.ms`; offload slow work; use cooperative/incremental rebalancing
Poison pill	One un-processable message (bad schema, bad data) makes the consumer crash & retry forever — the partition is stuck	Catch per-record errors; route the bad record to a dead-letter topic and move on
Consumer lag	Producers outpace consumers; lag grows unbounded; data gets stale	Add partitions + consumers (§7), speed up processing, batch; alert on lag (§19)
Partition / key skew	A hot key (one giant tenant) overloads its single partition while others idle (§10)	Choose a higher-cardinality key, or a composite key; isolate hot tenants
Slow processing blocks order	To keep order you process a partition serially, so one slow record delays the rest	Parallelize across partitions, not within; consider per-key parallelism patterns
Reprocessing on commit gaps	A crash between processing and offset commit re-delivers records	Expected under at-least-once — make processing idempotent (§9)

Always have a dead-letter strategy

A single malformed message must never be able to halt a whole partition indefinitely — that's a poison pill, and it's a common outage cause. Wrap per-record processing in error handling: on a persistent failure, publish the offending record (plus the error and metadata) to a dead-letter topic, commit past it, and keep the stream flowing. You investigate the dead-letter topic later instead of letting one bad record block everyone — the same DLQ discipline the task-queues chapter applies to failed jobs.

Operations & Monitoring

Running Kafka well comes down to watching a few signals and sizing a few things correctly. You don't need to memorize every metric, but you must know the ones that predict trouble.

Watch	Why it matters
Consumer lag	The headline health metric (§8): growing lag means consumers can't keep up — data is getting stale. Alert on it.
Under-replicated partitions	Replicas falling out of the ISR (§5) — durability is at risk; a broker may be struggling.
Broker disk & throughput	Retention (§11) means topics consume real disk; saturated disk or network throttles the cluster.
Rebalance frequency	Frequent rebalances (§18) indicate unstable consumers and cause stalls.
Request latency & errors	Produce/fetch latency and error rates surface broker or client problems early.

Sizing decisions you actually make

Partition count — sets max consumer parallelism and throughput (§4, §7). Estimate from target throughput and peak consumer count; err slightly high, but not wildly (each partition has overhead, and thousands per broker hurt). Hard to change later, so think ahead (§20).
Replication factor — usually 3 in production: tolerates one broker loss with room to spare. Pair with min.insync.replicas=2 and acks=all (§5).
Retention — how long events live (and thus disk usage and replay window). Set per topic by how far back consumers might need to read.

Tooling

Standard observability: export broker and client JMX metrics to Prometheus + Grafana (the same stack the task-queues and logging chapters use), and watch consumer lag specifically with a lag exporter or a UI like Kafka UI / Conduktor / AKHQ. kafka-consumer-groups.sh --describe gives a quick CLI read of a group's lag per partition when you're debugging live.

Designing with Kafka

Pulling the manual into design judgment. Kafka is a powerful, sharp tool; used where it fits it's transformative, used as a generic message bus it's a liability. A few principles separate good Kafka designs from painful ones.

Design principles

Model topics around events/facts, not commands or RPC. A topic is a stream of "things that happened" (PaymentSettled), consumed by whoever cares — not a way to tell a specific service what to do. If you find yourself wanting request/response, you want gRPC/REST, not Kafka.
Choose the partition key by your ordering boundary. Key by the entity whose events must stay ordered (account, order, user) — but at a granularity that still spreads load, to avoid hot partitions (§10, §18). The key is the single most consequential per-topic decision.
Assume at-least-once; build idempotent consumers. Don't chase end-to-end exactly-once across external systems (§9). Dedupe by event ID or use upserts so duplicates are harmless — cheaper and more robust.
Govern schemas from day one. Use the Schema Registry with backward compatibility (§12) so independent teams can evolve safely. A topic's schema is a contract.
Use the outbox for DB-and-event writes. Never dual-write; make the event atomic with the data and relay it (§15).
Size partitions for peak up front. Repartitioning breaks key ordering (§4), so plan for growth rather than reacting to it.

When Kafka, and when not

Reach for Kafka when you have a stream of events many systems consume, need high throughput or replay, are doing event-driven architecture, event sourcing, analytics pipelines, or log/metric ingestion. Don't reach for it for simple request/response (use gRPC/REST — the gRPC chapter), for discrete background jobs with per-message routing or priority (use a task queue — ch.10), or as your only datastore for point lookups (use a database). Matching the tool to the shape of the problem is the whole skill.

Cheat-Sheet

The whole manual compressed to what you reach for under pressure.

Concept	One-liner
Kafka	A distributed, durable, append-only commit log of events that many consumers read at their own offset.
Queue vs log	Queue: consumed once & deleted. Log: retained, replayable, every group reads all.
Commit log	Append-only, immutable, offset-addressed — sequential I/O, the basis of everything.
Topic	A named stream, split into partitions.
Partition	Unit of parallelism, ordering, and storage. Same key → same partition.
Broker / replication	Cluster of servers; each partition has a leader + follower replicas (ISR).
acks / ISR	`acks=all` + `min.insync.replicas=2` = no data loss on one broker failure.
Producer	Appends keyed records; key sets partition; idempotence avoids dup on retry.
Consumer group	Splits partitions among members. Same group = share work; different groups = broadcast.
Parallelism cap	Max useful consumers per group = partition count.
Offset / lag	Your position per partition; lag = tail − position = the key health metric.
Commit timing	Commit after processing = at-least-once; before = at-most-once.
Delivery	At-most / at-least / exactly-once. Pick at-least-once + idempotent consumers.
Ordering	Guaranteed within a partition only. Key by your ordering boundary.
Retention	Delete by time/size, or compact to keep latest value per key.
Compaction	Latest-per-key → a durable, replayable snapshot of current state (a table).
Schema Registry	Versioned schemas (Avro/Protobuf) with enforced backward compatibility.
EDA	Emit events; consumers react independently — loose coupling, easy to extend.
Event sourcing / CQRS	Log is the source of truth; read models are rebuildable projections.
Outbox / CDC	Write data + event in one DB txn; relay (Debezium) to Kafka — no dual-write gap.
Stream processing	Continuous compute (filter/aggregate/join/window); KStream vs KTable duality.
Pitfalls	Rebalance storms, poison pills (use a DLQ), lag, key skew.
vs queues	Kafka = event streaming/replay/throughput; queues = work + routing/priority. Use both.

The whole topic in one breath: Kafka is a distributed, durable, append-only commit log (§1–3) — unlike a queue, events are retained and every consumer group reads all of them and can replay (§2). A topic is split into partitions, the unit of parallelism, ordering, and storage (§4), spread across brokers and replicated for durability via leaders, followers, and the ISR (§5). Producers append keyed records (§6); consumer groups divide partitions to scale out, with parallelism capped by partition count (§7). Each consumer tracks its offset, and the lag to the tail is the key health signal (§8); when you commit decides your delivery semantics, and the pragmatic choice is at-least-once with idempotent consumers (§9). Ordering holds only within a partition, so you key by your ordering boundary and watch for skew (§10); retention deletes by age while compaction keeps the latest value per key (§11). A Schema Registry governs payloads across teams (§12); the log underpins event-driven architecture (§13), event sourcing and CQRS (§14), and the outbox/CDC pattern for atomic DB-and-event writes (§15). Stream processing computes continuously over it (§16); choose Kafka over a queue for streaming, replay, and throughput (§17), guard against rebalance storms, poison pills, lag, and skew (§18), monitor lag and ISR (§19), and design topics around events keyed sensibly, with idempotent consumers and governed schemas (§20).

Grounded in the Apache Kafka & Confluent docs · the log/stream-table literature (Kreps) · Debezium for CDC · Go 1.22+ (confluent-kafka-go) / Python 3.11+ (confluent-kafka) examples.