A detailed backend reference

WebSockets, a two-way
pipe the server can push down.

A first-principles walkthrough of real-time backend communication — from why HTTP's request–response shape can't push, through the WebSocket upgrade handshake, frames, and connection lifecycle, into building a server, managing thousands of connections, handling backpressure, and the hard part: scaling stateful connections across many instances with a pub/sub backplane. Written to explain not just what each piece does but why it exists and how it works underneath. Server and client code in both Go and Python.

Full-duplex over TCP Upgrade from HTTP Go 1.22+ · Python 3.11+ 21 sections

Part I · Why & What

The Problem WebSockets Solve

Plain HTTP has one shape: the client asks, the server answers (the HTTP chapter's request– response model). The server can never speak first — it has no way to push data to a client that hasn't just asked for it. That's fine for fetching a page or calling an API, but it breaks down the moment you need the server to notify the client as things happen: a new chat message, a live price tick, a notification, a collaborator's cursor moving, a match found. The server has the update; HTTP gives it no channel to deliver it.

Before WebSockets, you faked server push by having the client ask repeatedly — polling. Every few seconds the client sends "anything new?" and usually hears "no." This is wasteful on every axis: a flood of requests that mostly return nothing, the full overhead of HTTP headers and connection setup on each one, and latency bounded by your polling interval (poll every 5s and an event can sit undelivered for nearly 5s). WebSockets exist to replace that hack with a real persistent, two-way connection where either side can send at any time.

Polling is calling the post office every ten minutes to ask if mail arrived. A WebSocket is a direct phone line left open between you and them — either party simply speaks the instant there's something to say, with no redialing and no "anything yet?" overhead.

The core problem

Polling wastes round-trips; a WebSocket pushes the instant data exists

One connection, opened once, over which messages flow both ways the moment they exist — that's the entire value proposition.

The one idea

HTTP can't let the server initiate. WebSockets establish a single, long-lived, full-duplex connection so either side can send a message at any time, with no per-message request overhead and no polling delay. Everything else is how that pipe is opened, framed, managed, and scaled.

The Real-Time Spectrum

WebSockets aren't the only way to get server-to-client updates, and reaching for them reflexively is a common mistake. There's a spectrum of techniques, each a different point on the trade-off between simplicity and capability. Knowing all four lets you pick the simplest thing that meets the need (the full decision guide is §18).

Four approaches

From crude polling to full-duplex WebSockets

Technique	Direction	How it works	Best when
Short polling	client pulls	Repeated requests on a timer	Updates are rare & latency tolerance is high; you want zero infrastructure
Long polling	client pulls	Server holds the request open until it has data, then the client re-asks	You need push-like behavior but must work through any old proxy/client
SSE	server → client	A single long-lived HTTP response streaming text events	One-way server push (feeds, notifications) — simpler than WS
WebSocket	both ways	Persistent full-duplex connection upgraded from HTTP	True bidirectional, low-latency interaction (chat, games, collaboration)

Don't reach for WebSockets by default

WebSockets are the most capable option and the most operationally demanding — persistent stateful connections that complicate scaling, load balancing, and deployment (Parts III–IV). If you only need the server to push to the client (a live feed, notifications, progress), SSE is dramatically simpler: it's just HTTP, it auto-reconnects, and it sails through proxies. Use the full duplex of WebSockets only when the client genuinely needs to send frequently too (§18).

What a WebSocket Actually Is

A WebSocket is a persistent, bidirectional, full-duplex communication channel over a single TCP connection, established through an HTTP request and then kept open. Unpack each word: persistent — opened once and held for the session, not per message; bidirectional / full-duplex — both sides can send simultaneously and independently, not taking turns; single TCP connection — one socket carries everything, on the same ports as HTTP (80/443), so it traverses firewalls that allow web traffic.

The clever design choice is that a WebSocket starts life as an HTTP request and then upgrades (§4). This isn't an accident — it's what lets WebSockets reuse the existing web infrastructure (ports, TLS, proxies, the same origin) instead of requiring a new port or protocol that firewalls would block. Once upgraded, the connection stops speaking HTTP and starts speaking the WebSocket framing protocol (§5). The URL scheme reflects this: ws:// (like http://) and wss:// (TLS, like https://).

From HTTP to a persistent pipe

One TCP connection: born as HTTP, upgraded, then full-duplex frames

The same socket transitions from a normal HTTP exchange into a long-lived WebSocket — no second connection, no new port.

Relationship to HTTP/2 & gRPC streaming

WebSockets predate and sit alongside HTTP/2. HTTP/2 has its own multiplexed streams (the gRPC chapter), and gRPC bidirectional streaming covers similar ground for service-to-service use. WebSockets remain the standard for browser-to-server real-time, precisely because browsers expose a clean WebSocket API but not raw HTTP/2 framing (the same reason browsers can't speak native gRPC — gRPC §19). The comparison across all of these is §18.

Part II · The Protocol

The Upgrade Handshake

Every WebSocket begins with a single, special HTTP request — the upgrade handshake (the HTTP chapter's §19, in full). The client sends a normal-looking GET with headers that say "I'd like to switch this connection to the WebSocket protocol." If the server agrees, it replies with status 101 Switching Protocols — not 200 — and from that instant the connection is no longer HTTP; it's a raw WebSocket carrying frames (§5).

the handshake on the wire

GET /ws/chat HTTP/1.1
Host: api.example.com
Upgrade: websocket                              ← "switch this connection"
Connection: Upgrade                             ← the Upgrade header is meaningful
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==     ← a random client nonce (base64)
Sec-WebSocket-Version: 13                        ← the protocol version
Sec-WebSocket-Protocol: chat.v1                  ← optional subprotocol(s) offered
Origin: https://app.example.com                  ← the browser sends this; SERVER must check (§14)

HTTP/1.1 101 Switching Protocols                ← NOT 200 — the upgrade succeeded
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=  ← proves the server spoke WebSocket
Sec-WebSocket-Protocol: chat.v1                  ← the subprotocol the server picked

# After 101, both sides stop speaking HTTP and start exchanging WebSocket frames.

What `Sec-WebSocket-Accept` proves

The server computes Accept by concatenating the client's Sec-WebSocket-Key with a fixed, magic GUID defined by the spec, SHA-1 hashing it, and base64-encoding the result. This isn't security — the GUID is public — it's a proof of protocol understanding: it confirms the responder is a genuine WebSocket server and not some cache or proxy blindly echoing a 101. The client verifies the value matches before treating the connection as a working WebSocket. Libraries do this for you; you almost never compute it by hand.

Handshake flow

A GET that asks to upgrade, answered by 101

Subprotocols & the header constraint

Sec-WebSocket-Protocol lets client and server negotiate an application-level subprotocol (e.g. chat.v1, or a standard like graphql-ws) — a clean versioning seam. Note one important limitation that shapes auth (§13): the browser WebSocket API doesn't let you set arbitrary HTTP headers on the handshake (you can't add Authorization: Bearer), so credentials must travel another way — a query parameter, a cookie, the subprotocol field, or a first message after connect.

Frames — the Wire Format

After the handshake, data doesn't flow as a raw byte stream — it's chopped into frames, small structured units with a compact header. Framing is what lets one connection carry discrete messages (so the receiver knows where one ends and the next begins), distinguish text from binary, send control signals like close and ping, and split a large message across multiple frames. You rarely touch frames directly — libraries expose "send message / receive message" — but understanding the structure explains masking, message types, and control frames.

Frame structure

A few header bits, then the payload

Text vs binary, and why client frames are masked

Two data types. A frame is either text (UTF-8 — JSON usually rides here) or binary (raw bytes — Protobuf, images, custom formats). The library hands you a string or bytes accordingly.
Masking is mandatory client→server, forbidden server→client. Browsers XOR-mask every frame they send with a random key. This isn't encryption (use wss:// for that — §14); it's a security mitigation so a malicious page can't craft bytes that confuse old intermediary proxies into mis-parsing the stream as something else (cache poisoning). Server frames are never masked. Libraries handle this automatically.
Fragmentation. A large message can be split into a sequence of frames (FIN=0 on all but the last, opcode 0x0 continuation on the middle ones), letting senders stream without buffering the whole thing first.

Message boundaries are preserved — unlike raw TCP

A key convenience over raw TCP: WebSockets are message-oriented, not byte-stream-oriented. You send a message and the other side receives that message whole, not an arbitrary chunk of a byte stream you have to re-delimit yourself. Framing does the delimiting. (Under the hood it's still TCP, so the bytes are ordered and reliable — you just get message boundaries on top.)

The Connection Lifecycle

A WebSocket has a clear life: open → exchange messages → close, with events at each stage your code hooks into. The open completes after the handshake (§4). Then messages flow freely. Closing is itself a small handshake: one side sends a close frame (opcode 0x8) carrying a status code and optional reason, the other echoes a close frame, and both then shut the TCP connection. A clean close lets each side know why the connection ended.

Lifecycle

Open, message, close — with a close-code on the way out

Close code	Meaning
`1000`	Normal closure — done as intended
`1001`	Going away — server shutting down or client navigating away
`1006`	Abnormal — connection dropped with no close frame (network died, crash). You never send this; you observe it.
`1011`	Internal server error
`1008 / 1009`	Policy violation / message too big

Connections die silently — plan for 1006

The clean close handshake only happens when both sides cooperate. In reality connections drop ungracefully all the time — a laptop sleeps, Wi-Fi flaps, a phone switches networks, a NAT times out — and you get no close frame, just a dead socket (code 1006 or a read error, often noticed only much later). You cannot rely on a clean close to detect a gone client. That's exactly why heartbeats exist (§7) and why clients must reconnect (§17).

Ping/Pong & Heartbeats

Because a dead connection often looks identical to an idle one (no bytes either way), you need an active way to tell "still alive" from "silently gone." The protocol provides ping and pong control frames for exactly this: one side sends a ping, the other must answer with a pong. A heartbeat is the pattern of sending pings on a timer and treating a missed pong (within a deadline) as a dead connection to be closed and cleaned up. This is the same liveness problem as Kubernetes probes (containerization §17) and Kafka consumer keepalive, solved with the same idea: don't assume health, verify it.

Heartbeat

Ping on a timer; a missed pong means the connection is dead

Heartbeats also keep idle connections alive through proxies/NATs that would otherwise close a "silent" connection after a timeout — they serve double duty: liveness detection and keepalive.

Always run heartbeats

A real-time server without heartbeats slowly fills with zombie connections — clients that vanished but whose sockets the server still holds, leaking memory and goroutines/tasks (§19). Set a ping interval and a pong deadline, drop connections that miss it, and free their resources. It's not optional for production; it's how you keep the connection table honest.

Part III · Building It

A WebSocket Server

A server endpoint does three things: accept the upgrade (§4), then run a loop reading messages and a way to write them. In Go the standard library doesn't ship a WebSocket implementation, so you use a library — gorilla/websocket (the long-time standard) or coder/websocket (nhooyr). In Python, frameworks like FastAPI/Starlette expose WebSockets directly, and the websockets library is the standalone standard. Here is a minimal echo server — the "hello world" of WebSockets.

// github.com/gorilla/websocket
var upgrader = websocket.Upgrader{
    // CheckOrigin guards against cross-site hijacking — DO NOT leave it open (§14).
    CheckOrigin: func(r *http.Request) bool {
        return r.Header.Get("Origin") == "https://app.example.com"
    },
}

func wsHandler(w http.ResponseWriter, r *http.Request) {
    conn, err := upgrader.Upgrade(w, r, nil) // performs the 101 handshake (§4)
    if err != nil {
        return // upgrader already wrote an HTTP error
    }
    defer conn.Close() // always close → frees the socket (§19)

    for { // the READ LOOP — one per connection
        mt, msg, err := conn.ReadMessage()
        if err != nil {
            break // client closed or connection died (§6) → exit, cleanup via defer
        }
        // echo it straight back
        if err := conn.WriteMessage(mt, msg); err != nil {
            break
        }
    }
}

func main() {
    http.HandleFunc("/ws", wsHandler) // a normal HTTP route that upgrades
    http.ListenAndServe(":8080", nil)
}

# FastAPI / Starlette expose WebSockets natively (the `websockets` lib is an alternative).
from fastapi import FastAPI, WebSocket, WebSocketDisconnect

app = FastAPI()

@app.websocket("/ws")
async def ws_handler(ws: WebSocket):
    # Validate origin yourself before accepting (§14); then complete the handshake.
    await ws.accept()                      # performs the 101 handshake (§4)
    try:
        while True:                        # the READ LOOP — one coroutine per connection
            msg = await ws.receive_text()  # awaits the next message
            await ws.send_text(msg)        # echo it straight back
    except WebSocketDisconnect:
        pass                               # client closed or connection died (§6)
    # FastAPI cleans up the connection when the coroutine returns

# uvicorn app:app  — uvicorn speaks the WebSocket protocol for you

Go gives each connection a goroutine running a blocking read loop; Python gives each connection an async coroutine awaiting messages. Same structure — accept, loop reading, write — expressed in each language's concurrency model.

The read loop and the write must be coordinated

A subtle but critical rule (especially in Go with gorilla): concurrent writes to one connection are not safe, and you typically want one goroutine reading and another writing. The standard pattern is a dedicated writer goroutine fed by a channel, so all writes are serialized. Don't write to the same connection from multiple goroutines without synchronization — it corrupts the frame stream. The hub pattern (§10) builds exactly this structure.

A WebSocket Client

A backend is often a WebSocket client too — consuming a real-time feed from another service, bridging systems, or in tests. The client side: dial the ws:///wss:// URL (which does the handshake), then read and write messages. The shape mirrors the server.

// github.com/gorilla/websocket
func runClient() {
    // Dial performs the upgrade handshake; header carries auth where allowed (§13).
    h := http.Header{"Authorization": {"Bearer " + token}}
    conn, _, err := websocket.DefaultDialer.Dial("wss://api.example.com/ws", h)
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close()

    // reader goroutine
    go func() {
        for {
            _, msg, err := conn.ReadMessage()
            if err != nil {
                return
            }
            log.Printf("recv: %s", msg)
        }
    }()

    // send a message
    conn.WriteMessage(websocket.TextMessage, []byte(`{"type":"hello"}`))

    // graceful close: send a close frame, then the socket shuts (§6)
    conn.WriteMessage(websocket.CloseMessage,
        websocket.FormatCloseMessage(websocket.CloseNormalClosure, "bye"))
}

# `websockets` library — clean async client
import asyncio, websockets

async def run_client():
    # extra_headers carries auth where the runtime allows it (§13);
    # browsers can't set headers, but a backend client can.
    async with websockets.connect(
        "wss://api.example.com/ws",
        extra_headers={"Authorization": f"Bearer {token}"},
    ) as ws:                                 # context manager → handshake + clean close
        await ws.send('{"type":"hello"}')    # send a message

        async for msg in ws:                 # iterate incoming messages
            print("recv:", msg)
    # leaving the `async with` block sends a close frame and shuts down (§6)

asyncio.run(run_client())

Backend client = consumer of a stream

When your service is the client, treat the feed like any unreliable upstream: the connection will drop (§6), so wrap the connect-and-read in a reconnect loop with backoff (§17), and make processing idempotent if the feed can re-deliver on reconnect (the same at-least-once discipline as the Kafka chapter). A backend WebSocket client without reconnection logic silently stops receiving the first time the network hiccups.

Connection Management

One connection is trivial; the real work is managing many. A server holding thousands of live connections needs a central place that tracks who's connected so it can route and broadcast messages. The canonical solution is the hub (or "connection manager"): a single owner of the set of active connections, with channels/locks to register, unregister, and send — sidestepping the concurrent-write hazard from §8 by funneling everything through one coordinator.

The hub pattern

A central registry owns the connections; clients register and receive broadcasts

type Hub struct {
    clients    map[*Client]bool
    register   chan *Client
    unregister chan *Client
    broadcast  chan []byte
}

// One goroutine owns the map → all mutation is serialized here (no locks needed).
func (h *Hub) Run() {
    for {
        select {
        case c := <-h.register:
            h.clients[c] = true
        case c := <-h.unregister:
            if _, ok := h.clients[c]; ok {
                delete(h.clients, c)
                close(c.send) // tell the client's writer goroutine to stop
            }
        case msg := <-h.broadcast:
            for c := range h.clients {
                select {
                case c.send <- msg:        // queue into the client's buffered channel
                default:                   // buffer full → slow client; drop it (§12)
                    delete(h.clients, c)
                    close(c.send)
                }
            }
        }
    }
}

import asyncio

class Hub:
    def __init__(self):
        self.clients: set[WebSocket] = set()   # the active connection set

    async def register(self, ws: WebSocket):
        await ws.accept()
        self.clients.add(ws)

    def unregister(self, ws: WebSocket):
        self.clients.discard(ws)               # idempotent removal

    async def broadcast(self, message: str):
        dead = []
        for ws in self.clients:
            try:
                await ws.send_text(message)     # fan out to everyone
            except Exception:
                dead.append(ws)                 # send failed → connection is gone
        for ws in dead:
            self.unregister(ws)                 # clean up on the way out

hub = Hub()
# (asyncio is single-threaded, so the set needs no lock; just don't mutate
#  it while iterating — collect dead ones, remove after.)

Concurrency & cleanup are the whole game

Two things separate a toy from a real server. Concurrency safety: never touch the shared connection set from many goroutines/tasks without coordination (Go: one owner goroutine + per-client send channel; Python asyncio: single-threaded, but don't mutate while iterating). Cleanup: every path that ends a connection must remove it from the registry and free its resources, or you leak (§19). Register on connect, always unregister on disconnect — including error and panic paths.

Broadcasting & Rooms

Real apps rarely send to everyone — they send to a relevant subset: the members of one chat room, the subscribers to one stock symbol, the players in one game. The pattern is rooms (a.k.a. channels or topics): connections subscribe to named groups, and you broadcast to a group rather than the whole server. Concretely, the hub holds a map from room name to the set of connections in it.

Rooms

Connections grouped; a message goes only to its room

Rooms are just a grouping — until you scale out

On a single server, a room is simply map[roomName] → set of connections, and broadcasting iterates that set. The catch arrives with multiple servers (§15): a room's members may be spread across different instances, so a message produced on one server must reach members connected to another. The in-memory room map alone can't do that — which is the whole reason the backplane exists. Build rooms simply first; reach for the backplane when you outgrow one instance.

Backpressure & Slow Consumers

A failure mode unique to push systems: what happens when you produce messages faster than a client can receive them? A phone on a weak connection, a browser tab throttled in the background — the client drains slowly while the server keeps generating data for it. Without a plan, that data piles up in a per-client buffer that grows without bound, and one slow client can exhaust the server's memory. This is backpressure, and you must decide a policy.

The slow consumer

A bounded send buffer; when it fills, drop or disconnect — never grow forever

The standard implementation (visible in the hub of §10): give each client a bounded send queue. When you try to enqueue and it's full, you don't block the broadcaster (that would let one slow client stall everyone) — instead you apply a policy:

Drop messages — for lossy data where only the latest matters (a live position, a ticker), discard the overflow or keep only the newest. The slow client misses some updates but the server stays healthy.
Disconnect the client — for data where gaps are unacceptable, close the slow connection and let it reconnect and re-sync (§17). Better to drop one client than degrade all.
Conflate / batch — collapse multiple pending updates into one (send the latest snapshot rather than every intermediate step).

A bounded buffer is non-negotiable

The cardinal rule of push systems: never buffer unboundedly per connection. Every client gets a fixed-size queue and an explicit overflow policy (drop, disconnect, or conflate). This is the real-time analogue of Kafka's slow-consumer handling and the backpressure concerns in any streaming system — one slow consumer must never be able to take down the producer.

Part IV · Production

Authentication

Authenticating a WebSocket is trickier than a normal request because of a browser limitation from §4: the JavaScript WebSocket constructor can't set custom headers, so the usual Authorization: Bearer isn't available from a browser. You authenticate at the handshake (it's still an HTTP request — the server can read cookies, the URL, and the subprotocol) and you do it before calling Upgrade/accept. The principle from the auth chapter holds: verify identity at the door; an unauthenticated socket should never be upgraded.

Approach	How	Notes
Cookie / session	Browser sends the auth cookie on the handshake automatically	Natural for same-site web apps; pair with strict origin checks (§14) to resist CSWSH
Token in query string	`wss://host/ws?token=JWT`	Works everywhere, but the token can land in logs/history — use short-lived tokens
Subprotocol field	Smuggle the token in `Sec-WebSocket-Protocol`	Avoids the URL; a common JWT-over-WS trick
First message	Connect, then send an auth message before anything else	Flexible; the server must reject/close if auth doesn't arrive promptly
Header (non-browser)	Backend clients can set `Authorization` (§9)	Cleanest — but only for non-browser clients

func wsHandler(w http.ResponseWriter, r *http.Request) {
    // AUTHENTICATE FIRST — before upgrading. Reject with a normal HTTP error.
    token := r.URL.Query().Get("token")          // or read a cookie / subprotocol
    user, err := verifyJWT(token)                 // your auth logic (auth chapter)
    if err != nil {
        http.Error(w, "unauthorized", http.StatusUnauthorized) // 401, no upgrade
        return
    }
    // Only now perform the upgrade; attach the identity to the connection.
    conn, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        return
    }
    serve(conn, user) // every message from this conn is now tied to `user`
}

@app.websocket("/ws")
async def ws_handler(ws: WebSocket):
    # AUTHENTICATE FIRST — before accept(). Close with a policy code if it fails.
    token = ws.query_params.get("token")          # or a cookie / subprotocol
    user = verify_jwt(token)                       # your auth logic (auth chapter)
    if user is None:
        await ws.close(code=1008)                  # 1008 = policy violation; no accept
        return
    await ws.accept()                              # only now complete the handshake
    await serve(ws, user)                          # messages are tied to `user`

Authorize on every message, not just at connect

Authentication at the handshake establishes who the connection belongs to. But a long-lived socket outlives a single action, so you must still authorize each message against what that user is allowed to do (can they post to this room? send this command?) — the same authz-on- every-action rule as REST. Also consider that a token can expire mid-connection: long sessions need a re-auth or token-refresh strategy, or you'll have a connection authenticated by a credential that's no longer valid.

Security

WebSockets inherit the web's threat model plus a few of their own. The security chapter's principles apply directly; these are the WebSocket-specific essentials.

Concern	What & why
Use `wss://` (TLS)	Plain `ws://` is unencrypted — readable and tamperable on the wire. Always use `wss://` in production, exactly as HTTPS for HTTP (TLS, the HTTP chapter §20).
Validate `Origin` (CSWSH)	The browser sends an `Origin` header on the handshake; the server must check it. Skipping this enables Cross-Site WebSocket Hijacking — a malicious site opening an authenticated socket using the victim's cookies (the WS cousin of CSRF). Allowlist your own origins.
Validate every message	Treat inbound frames as untrusted input (the validation chapter): check structure, types, sizes; reject malformed messages. The connection being authenticated doesn't make its payloads safe.
Cap message size	Set a max frame/message size so a client can't send a giant payload to exhaust memory (close with `1009` if exceeded).
Rate-limit	A persistent connection can flood you with messages; rate-limit per connection/user to prevent abuse and accidental loops.
Bound connections per user	Limit concurrent connections per identity so one client can't open thousands and exhaust the connection table (§19).

The default CheckOrigin is a trap

The single most common WebSocket security hole: leaving origin checking disabled. Gorilla's default CheckOrigin rejects cross-origin requests, but countless tutorials tell you to override it with return true — which opens you to Cross-Site WebSocket Hijacking for any cookie-authenticated endpoint. Always allowlist the specific origins you trust (as in the §8 server), never blanket-allow. If you authenticate with bearer tokens rather than cookies the risk is lower, but origin validation is still the right default.

Scaling — the Backplane

This is the defining hard problem of WebSockets, and the reason they complicate architecture far more than stateless HTTP. A WebSocket connection is stateful and pinned to one server: the socket lives in the memory of the specific instance the client connected to. The moment you run more than one instance — which you must, for capacity and availability — a painful question appears: if user A is connected to server 1 and user B (in the same chat room) is connected to server 2, how does A's message reach B? Server 1 has no access to server 2's connections.

The scaling problem & its fix

Connections are stranded on different instances; a pub/sub backplane bridges them

The solution is a backplane (a.k.a. pub/sub adapter): an external message system that all server instances connect to. When a server needs to deliver a message to a client that may be on any instance, it publishes the message to the backplane; every instance is subscribed, receives it, and forwards it to whichever of its local connections should get it. The instances no longer need to know about each other's connections — they coordinate through the shared bus. Common backplanes:

Redis Pub/Sub — the most common choice: simple, fast, low-latency, fire-and-forget. Perfect when you only need live fan-out and don't need history (a disconnected user simply misses messages).
Kafka — when you also want durability, replay, and the events to feed other systems (the Kafka chapter): publish events to a topic, every WS instance consumes and forwards. Heavier, but the messages become part of your durable event stream.
NATS / cloud pub/sub — other fast messaging options with similar mechanics.

The mental shift: instances are interchangeable, the bus coordinates

The backplane turns "N servers each with their own islands of connections" into "N servers that collectively behave as one." It's the same decoupling idea as the Kafka chapter applied to live delivery: a publisher doesn't need to know which instance hosts a recipient — it publishes, and whoever holds the connection delivers. You cannot horizontally scale WebSockets without something playing this role; designing it in early is far easier than retrofitting it.

Load Balancing & Sticky Sessions

Putting a load balancer in front of WebSocket servers has its own wrinkles, because the connection is long-lived and stateful rather than a quick request. Three things matter: the LB must support WebSockets, connections usually need affinity, and deploys must drain connections gracefully.

The LB must speak WebSocket. It has to pass through the Upgrade handshake and then keep the connection open (an L7/HTTP-aware proxy, or L4 TCP pass-through). Most modern LBs (Nginx, HAProxy, cloud ALBs, the Kubernetes Ingress controllers of chapter 21) support this, but you configure longer idle timeouts — a default 60s idle timeout will kill quiet WebSockets, which is another reason for heartbeats (§7).
Sticky sessions / affinity. Once a client is connected to instance 1, all its frames must keep going to instance 1 (that's where its socket lives). With raw TCP that's automatic (one connection = one backend), but reconnections and any per-message routing want session affinity so a client returns to a consistent instance. With a backplane (§15) affinity matters less for delivery (any instance can publish/receive), but it still affects local state.
Connection draining on deploy. The big one: deploying a new version (the rolling updates of chapter 21 §18) terminates instances, and every WebSocket on a terminating instance drops. You must drain gracefully — stop accepting new connections, send a close frame (code 1001 "going away") so clients reconnect cleanly to a healthy instance, and allow time before SIGKILL (graceful shutdown chapter). Even so, a deploy causes a reconnection storm — which is why robust client reconnection (§17) is mandatory.

Every deploy disconnects everyone

Unlike stateless HTTP — where a rolling deploy is invisible — rolling a WebSocket fleet drops every live connection as old instances retire. This is fundamental to stateful connections, not a bug. Plan for it: graceful drain with 1001 on the server, exponential-backoff reconnection with jitter on the client (§17, to avoid a thundering-herd reconnect), and message design that tolerates a brief gap and re-sync. Treat reconnection as the normal case, not the exception.

Reconnection & Resilience

Given everything above — connections drop silently (§6), deploys disconnect everyone (§16), networks are flaky — a real-time client that doesn't automatically reconnect is broken by design. Resilience lives mostly on the client, with server-side support for re-syncing missed state. (Note: SSE gives you reconnection for free — another reason to prefer it when you only need server push, §2.)

Reconnect with backoff

On drop, retry with growing delays + jitter, then re-sync state

The resilience checklist

Reconnect with exponential backoff + jitter. Don't hammer a struggling server with instant retries; grow the delay (1s, 2s, 4s, … capped) and add randomness so thousands of clients disconnected by a deploy don't all reconnect at the same instant (the thundering-herd / reconnect-storm problem from §16).
Re-sync after reconnect. A new connection means a possible gap in the stream. On reconnect, the client should reconcile: fetch current state via a normal HTTP call, or tell the server "last event I saw was N, send me everything since" (a resume token / sequence number). This is where a durable backplane like Kafka (§15) shines — the missed messages are still in the log to replay.
Decide your delivery guarantee. Like Kafka (§9), WebSocket delivery isn't magically exactly-once. For critical data, give messages sequence numbers/IDs so the client can detect gaps and duplicates, and make handling idempotent. For lossy data (live positions), a gap is fine — the next update corrects it.
Buffer outbound on the client while disconnected (within limits), and flush on reconnect — so a user's action taken during a blip isn't simply lost.

Reconnection is a first-class feature

The connection dropping is normal operation, not an error case — so reconnection logic isn't a nice-to-have, it's core functionality you build and test deliberately. A real-time system's reliability is mostly determined by how gracefully it handles the constant churn of connections coming and going. Design for the drop.

Part V · Practice & Comparison

WebSockets vs SSE vs Streaming vs Polling

The decision that should come before you build anything real-time. WebSockets are powerful but carry the operational weight of Part IV (stateful connections, backplane, sticky sessions, reconnection), so picking the simplest tool that fits is a real cost saving. Four contenders, drawing together the spectrum (§2) and the gRPC chapter.

Need	Best fit	Why
Rare updates, latency tolerant	Polling	Trivial, stateless, no infra. Don't over-engineer.
Server → browser push, one-way	SSE	Just HTTP, auto-reconnect, sails through proxies; far simpler than WS. Feeds, notifications, live dashboards.
Browser ↔ server, frequent both ways	WebSocket	The only option for true low-latency bidirectional in a browser. Chat, games, collaborative editing, trading UIs.
Service ↔ service streaming	gRPC streaming	Typed, efficient, HTTP/2-native (gRPC chapter). For internal backends, not browsers.

Direction & reach

What each supports, at a glance

The rule: simplest that fits

Move left until it no longer works. Need server push to a browser but not client→server chatter? SSE, not WebSockets — you skip the entire backplane/sticky-session burden of Part IV. Genuinely need frequent bidirectional browser traffic? WebSockets. Two backend services? gRPC streaming. Reaching for WebSockets when SSE or even polling would do is one of the most common and costly over- engineering mistakes in real-time work.

Common Pitfalls

The failure modes that bite WebSocket servers in production — most are about resource management for long-lived, stateful connections, a discipline stateless HTTP never demanded.

Pitfall	What happens	Fix
Connection / goroutine leaks	A disconnect path forgets to unregister & free; goroutines/tasks and memory accumulate until the server dies	Always clean up on every exit path (defer/finally); register on connect, unregister on disconnect (§10)
No heartbeat → zombies	Dead connections look idle (§6); the table fills with sockets to clients that vanished	Ping on a timer; drop on missed pong (§7)
Unbounded send buffers	One slow client's queue grows without limit and OOMs the server	Bounded per-client queue + drop/disconnect policy (§12)
Concurrent writes to one conn	Multiple goroutines writing the same socket corrupt the frame stream	One writer goroutine fed by a channel (§8, §10)
Blocking the read loop	Doing slow work inline in the read loop stalls that connection (and trips heartbeats)	Hand work to a worker; keep the read loop fast (the queue idea, ch.10)
No origin check	Cross-Site WebSocket Hijacking on cookie-auth endpoints (§14)	Allowlist origins; never blanket-allow
Forgetting the backplane	Works on one instance; messages vanish between users once you scale out (§15)	Design the pub/sub backplane in early
No reconnection logic	Client silently stops receiving after the first network blip or deploy	Reconnect with backoff + jitter, then re-sync (§17)

Stateful connections need lifecycle discipline

Notice the pattern: nearly every pitfall is a resource that wasn't bounded or freed — a leaked connection, an unbounded buffer, a zombie socket. Stateless HTTP forgives sloppiness here because each request is short-lived and self-cleaning; a WebSocket lives for hours and holds memory the whole time. Treat every connection as something you must explicitly account for from open to close, and most production problems never appear.

Designing Real-Time Systems

Pulling the manual into design judgment for anything real-time.

Design principles

Choose the lightest transport first. Polling → SSE → WebSocket → gRPC streaming, leftmost that fits (§18). Don't pay for WebSocket complexity unless you need bidirectional browser traffic.
Design the backplane in from day one if you'll ever run more than one instance — which is always, in production (§15). Retrofitting cross-instance delivery into a single-instance design is painful.
Treat disconnection as normal. Heartbeats to detect it (§7), graceful drain on deploy (§16), client reconnection with backoff + jitter and state re-sync (§17). The connection will churn; build for it.
Bound everything per connection. Send-buffer size, message size, message rate, connections per user (§12, §14, §19). Unbounded anything is an outage waiting for one bad client.
Don't assume reliable, ordered, exactly-once delivery. Add sequence numbers/IDs for gap detection and idempotent handling where it matters; accept loss where it doesn't (§17, echoing Kafka §9).
Authenticate at the handshake, authorize every message, secure the channel. wss://, origin checks, validated inbound messages (§13–14).
Keep the read loop fast; offload real work to a queue/worker (ch.10), and serialize writes through one writer per connection (§8).

When WebSockets, and when not

Reach for WebSockets for genuinely interactive, low-latency, bidirectional browser experiences: chat, multiplayer games, collaborative editing, live trading/betting interfaces, real-time dashboards with client interaction. Don't reach for them for one-way server push (use SSE), for occasional updates (polling), for request/response (plain HTTP/REST — the HTTP chapter), or for service-to-service streaming (gRPC). The skill, as always, is matching the transport to the actual interaction shape rather than reaching for the most powerful tool by reflex.

Cheat-Sheet

The whole manual compressed to what you reach for under pressure.

Concept	One-liner
Why WebSockets	HTTP can't push; WS is a persistent full-duplex pipe either side can send on anytime.
Spectrum	Polling → long-polling → SSE (one-way) → WebSocket (two-way). Pick the simplest.
What it is	One long-lived TCP connection, started via HTTP, on ports 80/443. `ws://` / `wss://`.
Handshake	HTTP GET + `Upgrade: websocket` → `101 Switching Protocols`. Not 200.
Frames	FIN, opcode (text/binary/close/ping/pong), client→server masked; message-oriented, not byte-stream.
Lifecycle	open → messages → close (with a code). 1006 = died with no close frame.
Heartbeats	Ping/pong on a timer; missed pong = dead → close + clean up. Always run them.
Server	Accept upgrade, then a read loop; library does framing (gorilla / FastAPI / websockets).
Concurrency	One writer per connection; never write the same socket from many goroutines.
Hub	Central registry owns connections; register/unregister/broadcast. Always clean up.
Rooms	Group connections by name; broadcast to a group, not everyone.
Backpressure	Bounded per-client buffer; on overflow drop / disconnect / conflate. Never buffer unbounded.
Auth	At the handshake (cookie/query/subprotocol/first msg) — browsers can't set headers. Authz every message.
Security	`wss://` + validate `Origin` (CSWSH) + validate messages + size/rate limits.
Backplane	Connections are pinned per instance; Redis/Kafka pub/sub lets servers deliver across instances.
Load balancing	LB must speak WS; raise idle timeouts; affinity; drain (1001) on deploy.
Deploys	Every rolling deploy disconnects all clients — reconnection is mandatory.
Reconnection	Backoff + jitter, then re-sync missed state (resume token / HTTP fetch).
Delivery	Not exactly-once; add sequence IDs + idempotency where gaps matter.
vs others	SSE for one-way push; gRPC streaming for service↔service; polling for rare updates.
Pitfalls	Leaks, zombies, unbounded buffers, concurrent writes, no origin check, no backplane.
Design rule	Lightest transport that fits; bound everything; treat disconnection as normal.

The whole topic in one breath: WebSockets exist because HTTP can't let the server push, and polling is wasteful (§1) — but they're the heaviest option on a spectrum that includes polling and one-way SSE (§2). A WebSocket is a single persistent full-duplex TCP connection (§3) that begins as an HTTP Upgrade handshake answered with 101 (§4), then exchanges message-oriented frames (§5) through an open→message→close lifecycle (§6), kept honest by ping/pong heartbeats since dead connections look idle (§7). You build a server as accept-then-read-loop (§8), often act as a client too (§9), and manage many connections through a central hub (§10) with rooms for grouping (§11) and bounded buffers for backpressure so one slow client can't sink the server (§12). In production you authenticate at the handshake and authorize every message (§13), secure with wss:// and origin checks (§14), and — the hard part — scale stateful connections across instances with a pub/sub backplane (§15), behind a WebSocket-aware load balancer that drains on deploy (§16), with clients that reconnect with backoff and re-sync (§17). Above all: pick the lightest transport that fits (§18), avoid the resource-leak pitfalls (§19), and treat disconnection as the normal case (§20).

Grounded in MDN (developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) & RFC 6455 · gorilla/websocket & the websockets/FastAPI docs · Go 1.22+ / Python 3.11+ examples.