Kafka, the log
at the heart of your system.
A first-principles walkthrough of the distributed event-streaming platform that backs the nervous systems of LinkedIn, Uber, Netflix and most modern fintech — from the one idea everything rests on (an append-only commit log), through partitions, consumer groups, offsets and delivery semantics, into retention, schemas, event-driven patterns, the outbox/CDC trick, and stream processing. Written to explain not just what each piece does but why it exists and how it works underneath. Producer/consumer code in both Go and Python.
What Kafka Actually Is
Strip away the marketing and Kafka is one thing: a distributed, append-only, durable log of events that many independent consumers can read at their own pace. Not a queue, not a database, not a message bus in the classic sense — a log. Producers append records to the end; consumers read forward from a position they track themselves; and the records stay after being read, for a configured retention. Once that single picture is in your head, every feature in this manual is a consequence of it.
The official description is "a distributed event streaming platform," and each word earns its place. Distributed: it runs as a cluster of brokers, replicating data for fault tolerance and scaling horizontally. Event streaming: it's built around an unbounded, continuous flow of immutable events (something happened: an order was placed, a payment cleared, a sensor read), not request/response calls. Platform: producers, consumers, a connector ecosystem, and stream processing all sit on the same log.
Kafka is a distributed, durable, append-only commit log of immutable events, partitioned for scale and replicated for fault tolerance, that multiple independent consumers read at their own offset — with events retained after reading. Everything else is detail on top of that.
Queue vs Log — Why Kafka Is Different
This is the most important section in the manual, because conflating Kafka with a traditional message queue is the root of most confusion. The task-queues chapter covered the queue model in depth — producers, workers, brokers like RabbitMQ and SQS, retries, visibility timeouts. Kafka is a different shape of thing, and the difference is not a detail; it changes what problems it solves.
| Dimension | Traditional queue (ch.10) | Kafka log |
|---|---|---|
| After a message is read | Deleted (consumed once) | Retained (until retention expires) |
| Who receives a message | One competing consumer | Every consumer group, independently |
| Replay / re-read | No — it's gone | Yes — seek to any offset |
| Consumer position | Broker tracks & removes | Consumer tracks its own offset |
| Ordering | Weak/none across consumers | Strict within a partition |
| Mental model | Work distribution (to-do list) | Event history / source of truth (logbook) |
| Sweet spot | Task offloading, job processing | Event streaming, many subscribers, replay, high throughput |
You'll often use both: Kafka as the durable event backbone (orders, payments, clicks streamed to many systems), and a task queue (Celery/SQS — ch.10) for discrete background jobs (send this email, resize that image). The full comparison and "which when" is §17. For now hold the distinction: queue = distribute work and forget; log = record events and let many systems consume & replay them.
The Commit Log
Everything Kafka does rests on the humblest data structure in computing: the append-only log — an ordered sequence of records you can only add to the end of, never insert into the middle of or modify. Each record gets a monotonically increasing offset (0, 1, 2, …) that is its permanent address. This is the same structure a database uses for its write-ahead log and durability; Kafka's insight was to make the log itself the primary, public abstraction rather than a hidden internal detail.
Why an append-only log is so fast and so useful
- Sequential I/O is cheap. Appending to the end of a file and reading it forward are sequential disk operations — dramatically faster than random access, even on SSDs. Kafka leans on the OS page cache and zero-copy transfer to move data from disk to network without passing through application memory, which is how it sustains millions of messages per second on modest hardware.
- Immutability is simple & safe. Records never change, so there are no locks for updates, no read/write conflicts, and concurrent readers never interfere. An immutable history is also trivially cacheable and easy to reason about.
- The log is the history. Because nothing is overwritten, the log is a complete, ordered record of everything that happened — which is exactly what makes replay (§11), event sourcing (§14), and feeding new consumers from the beginning possible.
Hold the append-only log firmly: writes append and get an offset; reads move forward; records are immutable and retained. Partitions (§4) are just multiple such logs; consumer groups (§7) are bookmarks into them; delivery semantics (§9) are about how carefully you track those bookmarks. The whole system is the log idea, scaled and replicated.
Topics & Partitions
A topic is a named stream of related events — orders, payments,
page-views. It's the logical channel producers write to and consumers read from. But a topic is not
a single log; it's split into partitions, and the partition is where all the real mechanics
live. The partition is simultaneously Kafka's unit of parallelism, of ordering, and of
storage — three of the most important properties in the system, all riding on this one concept.
The three roles of a partition
- Parallelism. Partitions can live on different brokers and be consumed by different consumers at once, so a topic's throughput scales with its partition count. More partitions → more parallel consumption (up to the consumer-group limit — §7).
- Ordering. Kafka guarantees order within a single partition and makes no promise
across partitions. This is the central trade-off (§10): to keep related events ordered, you
route them to the same partition via a shared key (all events for
order-42use key"order-42", so they're strictly ordered relative to each other). - Storage. Each partition is physically a set of segment files on a broker's disk — the actual append-only log from §3. Retention and compaction (§11) operate per partition.
You can increase a topic's partitions later, but doing so breaks key-to-partition stability (the modulo changes, so existing keys may move) and thus breaks ordering guarantees for in-flight keys. And you can't easily decrease them. So choose partition count deliberately up front based on target throughput and the maximum consumer parallelism you'll need — too few caps your scaling, too many adds overhead (§19, §20).
Brokers, Replication & Durability
A Kafka cluster is a set of brokers (servers); partitions are spread across them, and that's how the cluster scales and survives failure. Durability comes from replication: each partition has one leader replica and some follower replicas on other brokers. All reads and writes go to the leader; followers continuously copy from it. If the leader's broker dies, a follower is promoted — no data lost, brief unavailability. This is the same desired-state/failover thinking as the Kubernetes chapter, applied to data.
The durability dials: acks & ISR
How safe a write is depends on two settings. The producer's acks says how many
replicas must confirm before the write is considered done: acks=0 (fire-and-forget, may lose
data), acks=1 (leader only — lost if the leader dies before followers copy it), or
acks=all (the leader plus all in-sync replicas — the safe choice). The
ISR (in-sync replica set) is the followers currently caught up with the leader; pairing
acks=all with min.insync.replicas=2 means a write only succeeds if at least two
replicas have it — surviving a single broker loss with zero data loss.
| Setting | Guarantee | Trade-off |
|---|---|---|
acks=0 | None — fire and forget | Fastest; can silently lose data |
acks=1 | Leader has it | Fast; lost if leader fails pre-replication |
acks=all + min.insync.replicas=2 | ≥2 replicas have it before success | Safest; slightly higher latency — the standard for important data |
Older Kafka relied on a separate ZooKeeper cluster to store metadata and elect leaders. Modern Kafka uses KRaft (Kafka Raft), folding that role into the brokers themselves via a Raft-based controller quorum — fewer moving parts, faster failover, simpler ops. If you read older docs mentioning ZooKeeper, KRaft is its replacement.
Producers
A producer publishes records to a topic. Three producer concerns shape correctness and performance: the key (which decides the partition and thus ordering — §4), the acks (durability — §5), and batching (the producer groups records and compresses them for throughput). A record is a key/value pair plus optional headers and a timestamp; the value is your event payload (often Avro/Protobuf — §12).
// Using confluent-kafka-go (librdkafka). kafka-go (segmentio) is a pure-Go alternative.
p, _ := kafka.NewProducer(&kafka.ConfigMap{
"bootstrap.servers": "localhost:9092",
"acks": "all", // durability: leader + in-sync replicas (§5)
"enable.idempotence": true, // no duplicates on producer retry (§9)
"linger.ms": 5, // wait up to 5ms to batch records (throughput)
"compression.type": "zstd",
})
defer p.Close()
topic := "orders"
// The KEY decides the partition → all events for one order stay ordered (§4, §10).
err := p.Produce(&kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
Key: []byte("order-42"),
Value: []byte(`{"event":"placed","amount":1999}`),
}, nil)
if err != nil { log.Fatal(err) }
// Producing is async + batched; Flush blocks until queued messages are delivered.
p.Flush(5000)
// Delivery reports tell you the final partition/offset (or an error) per message:
// for e := range p.Events() { if m, ok := e.(*kafka.Message); ok { ... } }# Using confluent-kafka (librdkafka). kafka-python / aiokafka are alternatives.
from confluent_kafka import Producer
p = Producer({
"bootstrap.servers": "localhost:9092",
"acks": "all", # durability: leader + in-sync replicas (§5)
"enable.idempotence": True, # no duplicates on producer retry (§9)
"linger.ms": 5, # batch window for throughput
"compression.type": "zstd",
})
def on_delivery(err, msg): # async callback: final partition/offset or error
if err:
print("delivery failed:", err)
else:
print(f"delivered to {msg.topic()}[{msg.partition()}]@{msg.offset()}")
# The KEY decides the partition → all events for one order stay ordered (§4, §10).
p.produce(
"orders",
key=b"order-42",
value=b'{"event":"placed","amount":1999}',
on_delivery=on_delivery,
)
p.flush() # block until queued messages are deliveredIf you don't set a key, records are spread across partitions (round-robin / sticky), maximizing parallelism but giving up any ordering relationship between them. That's fine for independent events; it's a bug if those events needed to stay ordered (all updates to one account, say). Key by your ordering/entity boundary — usually the entity ID — whenever order matters (§10).
Consumers & Consumer Groups
A consumer reads records from partitions. The powerful abstraction is the consumer group: a set of consumers sharing a group ID that cooperatively divide a topic's partitions among themselves so each partition is read by exactly one consumer in the group. Add consumers to the group and Kafka rebalances, reassigning partitions to spread the load. This is how you scale consumption horizontally — and it's the mechanism behind both work-sharing and broadcast.
c, _ := kafka.NewConsumer(&kafka.ConfigMap{
"bootstrap.servers": "localhost:9092",
"group.id": "billing", // the consumer GROUP — scale by adding instances
"auto.offset.reset": "earliest", // where to start if no committed offset (§8)
"enable.auto.commit": false, // we'll commit manually for at-least-once (§9)
})
defer c.Close()
c.Subscribe("orders", nil) // Kafka assigns this instance some partitions
for {
msg, err := c.ReadMessage(-1) // blocks for the next record
if err != nil { continue }
process(msg.Value) // do the work FIRST...
c.CommitMessage(msg) // ...THEN commit the offset (at-least-once, §9)
}from confluent_kafka import Consumer
c = Consumer({
"bootstrap.servers": "localhost:9092",
"group.id": "billing", # the consumer GROUP — scale by adding instances
"auto.offset.reset": "earliest", # where to start if no committed offset (§8)
"enable.auto.commit": False, # commit manually for at-least-once (§9)
})
c.subscribe(["orders"]) # Kafka assigns this instance some partitions
try:
while True:
msg = c.poll(1.0) # wait up to 1s for the next record
if msg is None:
continue
if msg.error():
print("error:", msg.error()); continue
process(msg.value()) # do the work FIRST...
c.commit(msg) # ...THEN commit the offset (at-least-once, §9)
finally:
c.close() # leave the group cleanly → triggers a rebalanceBecause one partition is consumed by at most one consumer per group, a group can usefully have at most as many active consumers as the topic has partitions. A topic with 4 partitions and 6 group members leaves 2 consumers idle. So your maximum consumption parallelism is set by partition count (§4) — another reason to size partitions for your peak throughput up front (§20).
Offsets & Offset Management
An offset is a record's position in its partition (§3). The consumer's job is to track
which offset it has processed up to — its committed offset — so that on restart or
rebalance it resumes from the right place instead of re-reading everything or skipping records. Kafka stores
these committed offsets in an internal topic (__consumer_offsets), keyed by group + partition. The
gap between the latest offset in a partition and a group's committed offset is consumer lag —
the single most important health metric for a consumer (§19).
When you commit decides your guarantee
The choice between auto-commit (Kafka periodically commits in the background) and manual commit (you commit explicitly) is really a choice about delivery semantics (§9). The ordering of "process the message" versus "commit the offset" is everything: commit after processing and a crash in between means you'll re-process (at-least-once); commit before processing and a crash means you'll skip (at-most-once). Auto-commit on a timer can do either depending on timing, which is why correctness-sensitive consumers commit manually after the work is done (as in §7's code).
Because each group has its own committed offsets, a brand-new group can start from
earliest and replay the entire retained history — the basis for bootstrapping a new
consumer or rebuilding a view (§14). You can also explicitly seek a consumer to any offset
or timestamp to reprocess a range. The log's retained immutability (§3) is what makes this possible;
a queue can't do it.
Delivery Semantics
The question every distributed messaging system must answer: if things fail and retry, how many times might a message be delivered? There are three possible guarantees, and Kafka can give you any of them depending on configuration — so you must choose consciously based on what your processing can tolerate.
True end-to-end exactly-once is narrow: Kafka's enable.idempotence stops producer
duplicates, and transactions give exactly-once for Kafka-to-Kafka stream processing
(consume, transform, produce, and commit offsets atomically). But the moment your consumer writes to an
external system (a database, a payment API), exactly-once across that boundary isn't free. The
battle-tested approach: choose at-least-once and make your processing
idempotent — safe to apply the same event twice (dedupe by event ID, upsert instead of
insert). This is the same idempotency discipline as the task-queues chapter (§14 there) and REST
idempotency keys, and it's why duplicates become harmless rather than catastrophic.
-- The consumer may see the same event twice on retry/rebalance.
-- Make the write idempotent so a duplicate is a no-op:
-- (a) dedupe by a unique event id
INSERT INTO processed_events (event_id) VALUES ($1)
ON CONFLICT (event_id) DO NOTHING; -- second delivery inserts nothing
-- ...only do the side effect if the insert affected a row.
-- (b) or make the effect itself idempotent — upsert the end state, don't increment
INSERT INTO account_balance (account_id, balance) VALUES ($1, $2)
ON CONFLICT (account_id) DO UPDATE SET balance = EXCLUDED.balance;
-- applying the same "balance = 100" twice lands on the same state.
Ordering Guarantees
One sentence to memorize: Kafka guarantees ordering within a partition, and nowhere else. Records in partition P are delivered to a consumer in the exact offset order they were written. Across partitions of the same topic, there is no global ordering — records interleave arbitrarily. This isn't a limitation to work around so much as the direct price of partition-level parallelism (§4): you can have total ordering, or you can have horizontal scale, but not both at once for the same stream.
First, key skew: if order requires keying by a coarse value (e.g. one mega-tenant), all
that tenant's traffic funnels into one partition, capping its throughput regardless of how many partitions
exist (§18). Second, retries can reorder: with a non-idempotent producer and multiple
in-flight requests, a retried message can land after a later one. Enabling enable.idempotence
(§6) preserves per-partition order even across retries — another reason it's a sensible default.
Retention & Log Compaction
Since records are retained after reading (§2), Kafka needs rules for when to remove them — otherwise the log grows forever. There are two cleanup policies, and they answer different needs. Time/size retention (the default) keeps records for a window — say 7 days or 50 GB per partition — then deletes the oldest. Log compaction is cleverer: it keeps the latest record for each key and garbage-collects older versions of that key, turning the log into a continuously-updated snapshot of "the current value per key."
A compacted topic is how Kafka holds current state forever without unbounded growth — the basis for event-sourcing snapshots (§14), Kafka Streams' state stores and KTables (§16), and config/lookup data that a new consumer can replay from scratch to rebuild a complete picture. Time-retention topics answer "what happened recently"; compacted topics answer "what is the latest state of everything." Pick the policy by which question your topic exists to answer.
Schemas & the Schema Registry
Kafka itself treats record values as opaque bytes — it neither knows nor cares what's inside. That freedom is a trap at scale: if producers and consumers don't agree on the payload's shape, a producer changing a field silently breaks every downstream consumer (the same contract-drift danger the gRPC and serialization chapters warned about, now spread across many independent teams). The fix is a schema for each topic plus a Schema Registry that stores schemas, assigns them IDs, and enforces compatibility as they evolve.
Formats & compatibility
The common serialization formats are Avro (the traditional Kafka pairing, compact with a rich schema-evolution story), Protobuf (the same IDL as the gRPC chapter, increasingly popular), and JSON Schema (readable, larger). The registry enforces a compatibility mode — typically backward compatibility, meaning new consumers can read old data: you may add optional fields and remove ones with defaults, but you may not rename or retype a field. This is precisely the additive-change discipline from the gRPC chapter (§04 there: field numbers, never reuse) and REST versioning, enforced automatically by a server.
In a microservice fleet, a topic's schema is a public API that many independent teams depend on. The Schema
Registry turns "please don't break the payload" from a tribal convention into a machine-checked gate —
the same role buf plays for gRPC .proto files. Treat schema changes with the same care as any
API change: additive and compatible by default.
Event-Driven Patterns
Kafka is the substrate for event-driven architecture (EDA), where services communicate by emitting and reacting to events rather than calling each other directly. Instead of the order service commanding the email, inventory, and analytics services (tight coupling, the order service must know them all), it simply announces "OrderPlaced" to a topic, and any interested service subscribes. The producer doesn't know — or care — who consumes. This is the deep payoff of the log's decoupling (§1): you add a new consumer without touching the producer.
| Pattern | What the event carries | Trade-off |
|---|---|---|
| Event notification | Just "X happened" + an ID | Tiny messages; consumers must call back for details (coupling, load) |
| Event-carried state transfer | The full relevant state in the event | Consumers are self-sufficient (no callbacks); larger messages, possible duplication |
| Choreography | Services react to events autonomously | Loose coupling, easy to extend; flow is harder to see end-to-end |
| Orchestration | A coordinator drives the steps (a saga) | Explicit, visible flow; reintroduces a central coordinator |
EDA trades the clarity of a direct call for flexibility. The downside is that no single place describes the whole workflow — "what happens when an order is placed?" is now spread across many subscribers, which makes debugging and reasoning harder. Mitigate with good observability (trace/correlation IDs through event headers, the same idea as §13/§15 of the layers and logging chapters), clear event schemas (§12), and choosing orchestration (an explicit saga) when a flow genuinely needs a visible, coordinated sequence.
Event Sourcing & CQRS
Two advanced patterns the log enables, often confused, frequently paired. Event sourcing
stores state as the sequence of events that produced it rather than just the current snapshot: instead
of a row balance = 100, you keep Deposited 60, Deposited 50,
Withdrew 10 — and derive the balance by replaying them. Kafka's immutable, ordered, retained
log (§3) is a natural event store. CQRS (Command Query Responsibility Segregation)
separates the write model from one or more read models optimized for querying.
The benefits are real: a complete audit trail (every change is a recorded fact — gold for finance and compliance), the ability to reconstruct state at any past point by replaying to an offset, and multiple specialized read models each rebuildable from the same log (compaction — §11 — provides the snapshot to avoid replaying from the dawn of time).
Event sourcing adds real complexity: schema evolution of historical events is hard (you can never change the past, only how you interpret it), rebuilding state has a cost, and "what's the current value?" now requires a projection rather than a row read. It shines for audit-heavy, complex domains (payments, ledgers — squarely fintech) where the history is the value. For ordinary CRUD, a normal database is simpler and correct — use the pattern where the audit trail and replay genuinely pay for the complexity, not as a default architecture.
The Outbox Pattern & Change Data Capture
A subtle, important problem in event-driven systems: when a service must both update its database and publish an event, doing them as two separate operations is unsafe. If the DB commit succeeds but the Kafka publish fails (or vice versa), your systems diverge — the order exists but no "OrderPlaced" event fired, or an event fired for an order that rolled back. This is the dual-write problem, and you can't fix it by reordering the two writes — there's always a crash window between them.
The outbox pattern solves it with a single local transaction: write your business change and an "outbox" row describing the event into the same database transaction — so they commit atomically. A separate relay process then reads new outbox rows and publishes them to Kafka (retrying until success, at-least-once — hence consumers stay idempotent, §9). The modern, low-friction way to run that relay is Change Data Capture (CDC): a tool like Debezium tails the database's transaction log and streams committed changes straight into Kafka — no application polling required.
BEGIN;
-- 1) the actual business change
INSERT INTO orders (id, customer_id, amount, status)
VALUES ('order-42', 'cust-7', 1999, 'PLACED');
-- 2) the event to publish, written in the SAME transaction → atomic with (1)
INSERT INTO outbox (id, aggregate, event_type, payload, created_at)
VALUES (gen_random_uuid(), 'order-42', 'OrderPlaced',
'{"id":"order-42","amount":1999}', now());
COMMIT; -- both succeed together, or neither does — no dual-write gap
-- A relay (or Debezium CDC on the transaction log) then reads new `outbox`
-- rows and produces them to Kafka, marking each published. Retries are safe
-- because downstream consumers are idempotent (§9).
Whenever a write must atomically produce an event — ubiquitous in payments and order systems — reach for the outbox (often via Debezium CDC). It converts an impossible distributed-transaction problem into an ordinary local transaction plus a reliable, idempotent relay. This is one of the most practically valuable patterns in the whole event-driven toolkit.
Stream Processing
So far consumers read events one at a time. Stream processing is the discipline of computing continuously over the stream — filtering, transforming, aggregating, joining — producing new derived streams. Instead of "read event, do a thing," you express a standing query like "a 5-minute rolling count of failed payments per merchant" that updates forever as events arrive. Kafka's own library for this is Kafka Streams (JVM); ksqlDB offers a SQL-like layer, and Apache Flink is the heavyweight for large, complex jobs.
| Concept | Meaning |
|---|---|
| KStream | An unbounded stream of independent events (an append log view) — every record matters. |
| KTable | A changelog interpreted as current state per key — the compacted-topic view (§11); latest value wins. |
| Stateful ops | Aggregations/joins that remember across events, backed by a local state store (itself a compacted Kafka topic for fault tolerance). |
| Windowing | Bucketing events by time (tumbling, hopping, session) to aggregate "per 5 minutes," etc. |
| Event vs processing time | When the event happened vs when you process it — they differ, and late/out-of-order data must be handled. |
The elegant core idea: a stream and a table are two views of the same data. A stream of changes aggregated by key gives a table (current state); a table's changelog is a stream of updates. This is exactly the retention-vs-compaction distinction (§11) raised to a programming model — and it's why a compacted topic is a table and a normal topic is a stream. Most backend engineers consume streams with the plain consumer (§7); reach for a stream- processing framework when you need stateful aggregation, joins, or windowing that would be painful to hand-roll.
Kafka vs Traditional Queues
Returning to the §2 distinction with enough context to decide well. This is the question that sends people to the task-queues chapter and back: do I want Kafka or a message queue (RabbitMQ, SQS, the Celery/Sidekiq world)? They overlap enough to confuse and differ enough to matter.
| Dimension | Kafka (log) | RabbitMQ / SQS (queue) |
|---|---|---|
| Core model | Durable, replayable event log | Transient work queue |
| After consumption | Retained; replayable by any group | Deleted once acked |
| Fan-out to many systems | Native — each group reads all | Needs exchanges/fan-out config; not the default |
| Throughput | Very high (millions/sec) | High, but typically lower |
| Ordering | Strong per partition | Weak; FIFO modes are limited/slower |
| Per-message routing/priority/TTL | Limited — not its model | Rich (routing keys, priority, delay, DLQ) |
| Replay / time-travel | Yes — seek to any offset | No |
| Best for | Event streaming, many subscribers, replay, analytics, event sourcing | Task offloading, RPC-ish jobs, complex routing, per-message control |
Reach for Kafka when events are facts many systems care about, when you need high throughput, ordering, or the ability to replay history — the streaming backbone. Reach for a queue (ch.10) when you're handing off discrete units of work to be done once, especially with per-message routing, priorities, delays, or dead-letter handling. They're complementary: a great many real systems run Kafka as the event spine and a task queue for background jobs. Don't force task-queue semantics onto Kafka (per-message TTL, priority, selective ack) — that's swimming against its design.
Consumer Pitfalls
Most production Kafka pain shows up on the consumer side. These are the failure modes that bite real teams — worth recognizing before they page you at 2am.
| Pitfall | What happens | Mitigation |
|---|---|---|
| Rebalancing storms | Frequent group rebalances (slow processing trips the poll timeout, member declared dead) stall consumption repeatedly | Keep per-poll work bounded, tune max.poll.records / max.poll.interval.ms; offload slow work; use cooperative/incremental rebalancing |
| Poison pill | One un-processable message (bad schema, bad data) makes the consumer crash & retry forever — the partition is stuck | Catch per-record errors; route the bad record to a dead-letter topic and move on |
| Consumer lag | Producers outpace consumers; lag grows unbounded; data gets stale | Add partitions + consumers (§7), speed up processing, batch; alert on lag (§19) |
| Partition / key skew | A hot key (one giant tenant) overloads its single partition while others idle (§10) | Choose a higher-cardinality key, or a composite key; isolate hot tenants |
| Slow processing blocks order | To keep order you process a partition serially, so one slow record delays the rest | Parallelize across partitions, not within; consider per-key parallelism patterns |
| Reprocessing on commit gaps | A crash between processing and offset commit re-delivers records | Expected under at-least-once — make processing idempotent (§9) |
A single malformed message must never be able to halt a whole partition indefinitely — that's a poison pill, and it's a common outage cause. Wrap per-record processing in error handling: on a persistent failure, publish the offending record (plus the error and metadata) to a dead-letter topic, commit past it, and keep the stream flowing. You investigate the dead-letter topic later instead of letting one bad record block everyone — the same DLQ discipline the task-queues chapter applies to failed jobs.
Operations & Monitoring
Running Kafka well comes down to watching a few signals and sizing a few things correctly. You don't need to memorize every metric, but you must know the ones that predict trouble.
| Watch | Why it matters |
|---|---|
| Consumer lag | The headline health metric (§8): growing lag means consumers can't keep up — data is getting stale. Alert on it. |
| Under-replicated partitions | Replicas falling out of the ISR (§5) — durability is at risk; a broker may be struggling. |
| Broker disk & throughput | Retention (§11) means topics consume real disk; saturated disk or network throttles the cluster. |
| Rebalance frequency | Frequent rebalances (§18) indicate unstable consumers and cause stalls. |
| Request latency & errors | Produce/fetch latency and error rates surface broker or client problems early. |
Sizing decisions you actually make
- Partition count — sets max consumer parallelism and throughput (§4, §7). Estimate from target throughput and peak consumer count; err slightly high, but not wildly (each partition has overhead, and thousands per broker hurt). Hard to change later, so think ahead (§20).
- Replication factor — usually 3 in production: tolerates one broker loss with room to
spare. Pair with
min.insync.replicas=2andacks=all(§5). - Retention — how long events live (and thus disk usage and replay window). Set per topic by how far back consumers might need to read.
Standard observability: export broker and client JMX metrics to Prometheus + Grafana (the same stack the
task-queues and logging chapters use), and watch consumer lag specifically with a lag exporter or a UI like
Kafka UI / Conduktor / AKHQ. kafka-consumer-groups.sh --describe gives a quick CLI read of a
group's lag per partition when you're debugging live.
Designing with Kafka
Pulling the manual into design judgment. Kafka is a powerful, sharp tool; used where it fits it's transformative, used as a generic message bus it's a liability. A few principles separate good Kafka designs from painful ones.
Design principles
- Model topics around events/facts, not commands or RPC. A topic is a stream of "things that
happened" (
PaymentSettled), consumed by whoever cares — not a way to tell a specific service what to do. If you find yourself wanting request/response, you want gRPC/REST, not Kafka. - Choose the partition key by your ordering boundary. Key by the entity whose events must stay ordered (account, order, user) — but at a granularity that still spreads load, to avoid hot partitions (§10, §18). The key is the single most consequential per-topic decision.
- Assume at-least-once; build idempotent consumers. Don't chase end-to-end exactly-once across external systems (§9). Dedupe by event ID or use upserts so duplicates are harmless — cheaper and more robust.
- Govern schemas from day one. Use the Schema Registry with backward compatibility (§12) so independent teams can evolve safely. A topic's schema is a contract.
- Use the outbox for DB-and-event writes. Never dual-write; make the event atomic with the data and relay it (§15).
- Size partitions for peak up front. Repartitioning breaks key ordering (§4), so plan for growth rather than reacting to it.
Reach for Kafka when you have a stream of events many systems consume, need high throughput or replay, are doing event-driven architecture, event sourcing, analytics pipelines, or log/metric ingestion. Don't reach for it for simple request/response (use gRPC/REST — the gRPC chapter), for discrete background jobs with per-message routing or priority (use a task queue — ch.10), or as your only datastore for point lookups (use a database). Matching the tool to the shape of the problem is the whole skill.
Cheat-Sheet
The whole manual compressed to what you reach for under pressure.
| Concept | One-liner |
|---|---|
| Kafka | A distributed, durable, append-only commit log of events that many consumers read at their own offset. |
| Queue vs log | Queue: consumed once & deleted. Log: retained, replayable, every group reads all. |
| Commit log | Append-only, immutable, offset-addressed — sequential I/O, the basis of everything. |
| Topic | A named stream, split into partitions. |
| Partition | Unit of parallelism, ordering, and storage. Same key → same partition. |
| Broker / replication | Cluster of servers; each partition has a leader + follower replicas (ISR). |
| acks / ISR | acks=all + min.insync.replicas=2 = no data loss on one broker failure. |
| Producer | Appends keyed records; key sets partition; idempotence avoids dup on retry. |
| Consumer group | Splits partitions among members. Same group = share work; different groups = broadcast. |
| Parallelism cap | Max useful consumers per group = partition count. |
| Offset / lag | Your position per partition; lag = tail − position = the key health metric. |
| Commit timing | Commit after processing = at-least-once; before = at-most-once. |
| Delivery | At-most / at-least / exactly-once. Pick at-least-once + idempotent consumers. |
| Ordering | Guaranteed within a partition only. Key by your ordering boundary. |
| Retention | Delete by time/size, or compact to keep latest value per key. |
| Compaction | Latest-per-key → a durable, replayable snapshot of current state (a table). |
| Schema Registry | Versioned schemas (Avro/Protobuf) with enforced backward compatibility. |
| EDA | Emit events; consumers react independently — loose coupling, easy to extend. |
| Event sourcing / CQRS | Log is the source of truth; read models are rebuildable projections. |
| Outbox / CDC | Write data + event in one DB txn; relay (Debezium) to Kafka — no dual-write gap. |
| Stream processing | Continuous compute (filter/aggregate/join/window); KStream vs KTable duality. |
| Pitfalls | Rebalance storms, poison pills (use a DLQ), lag, key skew. |
| vs queues | Kafka = event streaming/replay/throughput; queues = work + routing/priority. Use both. |
The whole topic in one breath: Kafka is a distributed, durable, append-only commit log (§1–3) — unlike a queue, events are retained and every consumer group reads all of them and can replay (§2). A topic is split into partitions, the unit of parallelism, ordering, and storage (§4), spread across brokers and replicated for durability via leaders, followers, and the ISR (§5). Producers append keyed records (§6); consumer groups divide partitions to scale out, with parallelism capped by partition count (§7). Each consumer tracks its offset, and the lag to the tail is the key health signal (§8); when you commit decides your delivery semantics, and the pragmatic choice is at-least-once with idempotent consumers (§9). Ordering holds only within a partition, so you key by your ordering boundary and watch for skew (§10); retention deletes by age while compaction keeps the latest value per key (§11). A Schema Registry governs payloads across teams (§12); the log underpins event-driven architecture (§13), event sourcing and CQRS (§14), and the outbox/CDC pattern for atomic DB-and-event writes (§15). Stream processing computes continuously over it (§16); choose Kafka over a queue for streaming, replay, and throughput (§17), guard against rebalance storms, poison pills, lag, and skew (§18), monitor lag and ISR (§19), and design topics around events keyed sensibly, with idempotent consumers and governed schemas (§20).
Grounded in the Apache Kafka & Confluent docs · the log/stream-table literature (Kreps) · Debezium for CDC · Go 1.22+ (confluent-kafka-go) / Python 3.11+ (confluent-kafka) examples.