WebSockets, a two-way
pipe the server can push down.
A first-principles walkthrough of real-time backend communication — from why HTTP's request–response shape can't push, through the WebSocket upgrade handshake, frames, and connection lifecycle, into building a server, managing thousands of connections, handling backpressure, and the hard part: scaling stateful connections across many instances with a pub/sub backplane. Written to explain not just what each piece does but why it exists and how it works underneath. Server and client code in both Go and Python.
The Problem WebSockets Solve
Plain HTTP has one shape: the client asks, the server answers (the HTTP chapter's request– response model). The server can never speak first — it has no way to push data to a client that hasn't just asked for it. That's fine for fetching a page or calling an API, but it breaks down the moment you need the server to notify the client as things happen: a new chat message, a live price tick, a notification, a collaborator's cursor moving, a match found. The server has the update; HTTP gives it no channel to deliver it.
Before WebSockets, you faked server push by having the client ask repeatedly — polling. Every few seconds the client sends "anything new?" and usually hears "no." This is wasteful on every axis: a flood of requests that mostly return nothing, the full overhead of HTTP headers and connection setup on each one, and latency bounded by your polling interval (poll every 5s and an event can sit undelivered for nearly 5s). WebSockets exist to replace that hack with a real persistent, two-way connection where either side can send at any time.
HTTP can't let the server initiate. WebSockets establish a single, long-lived, full-duplex connection so either side can send a message at any time, with no per-message request overhead and no polling delay. Everything else is how that pipe is opened, framed, managed, and scaled.
The Real-Time Spectrum
WebSockets aren't the only way to get server-to-client updates, and reaching for them reflexively is a common mistake. There's a spectrum of techniques, each a different point on the trade-off between simplicity and capability. Knowing all four lets you pick the simplest thing that meets the need (the full decision guide is §18).
| Technique | Direction | How it works | Best when |
|---|---|---|---|
| Short polling | client pulls | Repeated requests on a timer | Updates are rare & latency tolerance is high; you want zero infrastructure |
| Long polling | client pulls | Server holds the request open until it has data, then the client re-asks | You need push-like behavior but must work through any old proxy/client |
| SSE | server → client | A single long-lived HTTP response streaming text events | One-way server push (feeds, notifications) — simpler than WS |
| WebSocket | both ways | Persistent full-duplex connection upgraded from HTTP | True bidirectional, low-latency interaction (chat, games, collaboration) |
WebSockets are the most capable option and the most operationally demanding — persistent stateful connections that complicate scaling, load balancing, and deployment (Parts III–IV). If you only need the server to push to the client (a live feed, notifications, progress), SSE is dramatically simpler: it's just HTTP, it auto-reconnects, and it sails through proxies. Use the full duplex of WebSockets only when the client genuinely needs to send frequently too (§18).
What a WebSocket Actually Is
A WebSocket is a persistent, bidirectional, full-duplex communication channel over a single TCP connection, established through an HTTP request and then kept open. Unpack each word: persistent — opened once and held for the session, not per message; bidirectional / full-duplex — both sides can send simultaneously and independently, not taking turns; single TCP connection — one socket carries everything, on the same ports as HTTP (80/443), so it traverses firewalls that allow web traffic.
The clever design choice is that a WebSocket starts life as an HTTP request and then
upgrades (§4). This isn't an accident — it's what lets WebSockets reuse the existing web
infrastructure (ports, TLS, proxies, the same origin) instead of requiring a new port or protocol that
firewalls would block. Once upgraded, the connection stops speaking HTTP and starts speaking the WebSocket
framing protocol (§5). The URL scheme reflects this: ws:// (like http://) and
wss:// (TLS, like https://).
WebSockets predate and sit alongside HTTP/2. HTTP/2 has its own multiplexed streams (the gRPC chapter), and gRPC bidirectional streaming covers similar ground for service-to-service use. WebSockets remain the standard for browser-to-server real-time, precisely because browsers expose a clean WebSocket API but not raw HTTP/2 framing (the same reason browsers can't speak native gRPC — gRPC §19). The comparison across all of these is §18.
The Upgrade Handshake
Every WebSocket begins with a single, special HTTP request — the upgrade handshake (the
HTTP chapter's §19, in full). The client sends a normal-looking GET with headers that say "I'd
like to switch this connection to the WebSocket protocol." If the server agrees, it replies with status
101 Switching Protocols — not 200 — and from that instant
the connection is no longer HTTP; it's a raw WebSocket carrying frames (§5).
GET /ws/chat HTTP/1.1
Host: api.example.com
Upgrade: websocket ← "switch this connection"
Connection: Upgrade ← the Upgrade header is meaningful
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== ← a random client nonce (base64)
Sec-WebSocket-Version: 13 ← the protocol version
Sec-WebSocket-Protocol: chat.v1 ← optional subprotocol(s) offered
Origin: https://app.example.com ← the browser sends this; SERVER must check (§14)
HTTP/1.1 101 Switching Protocols ← NOT 200 — the upgrade succeeded
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= ← proves the server spoke WebSocket
Sec-WebSocket-Protocol: chat.v1 ← the subprotocol the server picked
# After 101, both sides stop speaking HTTP and start exchanging WebSocket frames.
What Sec-WebSocket-Accept proves
The server computes Accept by concatenating the client's Sec-WebSocket-Key with a
fixed, magic GUID defined by the spec, SHA-1 hashing it, and base64-encoding the result. This isn't security —
the GUID is public — it's a proof of protocol understanding: it confirms the responder is
a genuine WebSocket server and not some cache or proxy blindly echoing a 101. The client verifies
the value matches before treating the connection as a working WebSocket. Libraries do this for you; you almost
never compute it by hand.
Sec-WebSocket-Protocol lets client and server negotiate an application-level subprotocol (e.g.
chat.v1, or a standard like graphql-ws) — a clean versioning seam. Note one
important limitation that shapes auth (§13): the browser WebSocket API doesn't let you set
arbitrary HTTP headers on the handshake (you can't add Authorization: Bearer), so credentials
must travel another way — a query parameter, a cookie, the subprotocol field, or a first message after
connect.
Frames — the Wire Format
After the handshake, data doesn't flow as a raw byte stream — it's chopped into frames, small structured units with a compact header. Framing is what lets one connection carry discrete messages (so the receiver knows where one ends and the next begins), distinguish text from binary, send control signals like close and ping, and split a large message across multiple frames. You rarely touch frames directly — libraries expose "send message / receive message" — but understanding the structure explains masking, message types, and control frames.
Text vs binary, and why client frames are masked
- Two data types. A frame is either text (UTF-8 — JSON usually rides here) or binary (raw bytes — Protobuf, images, custom formats). The library hands you a string or bytes accordingly.
- Masking is mandatory client→server, forbidden server→client. Browsers XOR-mask
every frame they send with a random key. This isn't encryption (use
wss://for that — §14); it's a security mitigation so a malicious page can't craft bytes that confuse old intermediary proxies into mis-parsing the stream as something else (cache poisoning). Server frames are never masked. Libraries handle this automatically. - Fragmentation. A large message can be split into a sequence of frames (FIN=0 on all but the
last, opcode
0x0continuation on the middle ones), letting senders stream without buffering the whole thing first.
A key convenience over raw TCP: WebSockets are message-oriented, not byte-stream-oriented. You send a message and the other side receives that message whole, not an arbitrary chunk of a byte stream you have to re-delimit yourself. Framing does the delimiting. (Under the hood it's still TCP, so the bytes are ordered and reliable — you just get message boundaries on top.)
The Connection Lifecycle
A WebSocket has a clear life: open → exchange messages → close, with events at each
stage your code hooks into. The open completes after the handshake (§4). Then messages flow freely. Closing
is itself a small handshake: one side sends a close frame (opcode 0x8) carrying a
status code and optional reason, the other echoes a close frame, and both then shut the TCP connection. A clean
close lets each side know why the connection ended.
| Close code | Meaning |
|---|---|
1000 | Normal closure — done as intended |
1001 | Going away — server shutting down or client navigating away |
1006 | Abnormal — connection dropped with no close frame (network died, crash). You never send this; you observe it. |
1011 | Internal server error |
1008 / 1009 | Policy violation / message too big |
The clean close handshake only happens when both sides cooperate. In reality connections drop ungracefully
all the time — a laptop sleeps, Wi-Fi flaps, a phone switches networks, a NAT times out — and you
get no close frame, just a dead socket (code 1006 or a read error, often noticed only
much later). You cannot rely on a clean close to detect a gone client. That's exactly why heartbeats exist
(§7) and why clients must reconnect (§17).
Ping/Pong & Heartbeats
Because a dead connection often looks identical to an idle one (no bytes either way), you need an active way to tell "still alive" from "silently gone." The protocol provides ping and pong control frames for exactly this: one side sends a ping, the other must answer with a pong. A heartbeat is the pattern of sending pings on a timer and treating a missed pong (within a deadline) as a dead connection to be closed and cleaned up. This is the same liveness problem as Kubernetes probes (containerization §17) and Kafka consumer keepalive, solved with the same idea: don't assume health, verify it.
A real-time server without heartbeats slowly fills with zombie connections — clients that vanished but whose sockets the server still holds, leaking memory and goroutines/tasks (§19). Set a ping interval and a pong deadline, drop connections that miss it, and free their resources. It's not optional for production; it's how you keep the connection table honest.
A WebSocket Server
A server endpoint does three things: accept the upgrade (§4), then run a loop reading messages and a way
to write them. In Go the standard library doesn't ship a WebSocket implementation, so you use a library —
gorilla/websocket (the long-time standard) or coder/websocket (nhooyr).
In Python, frameworks like FastAPI/Starlette expose WebSockets directly, and the websockets library
is the standalone standard. Here is a minimal echo server — the "hello world" of WebSockets.
// github.com/gorilla/websocket
var upgrader = websocket.Upgrader{
// CheckOrigin guards against cross-site hijacking — DO NOT leave it open (§14).
CheckOrigin: func(r *http.Request) bool {
return r.Header.Get("Origin") == "https://app.example.com"
},
}
func wsHandler(w http.ResponseWriter, r *http.Request) {
conn, err := upgrader.Upgrade(w, r, nil) // performs the 101 handshake (§4)
if err != nil {
return // upgrader already wrote an HTTP error
}
defer conn.Close() // always close → frees the socket (§19)
for { // the READ LOOP — one per connection
mt, msg, err := conn.ReadMessage()
if err != nil {
break // client closed or connection died (§6) → exit, cleanup via defer
}
// echo it straight back
if err := conn.WriteMessage(mt, msg); err != nil {
break
}
}
}
func main() {
http.HandleFunc("/ws", wsHandler) // a normal HTTP route that upgrades
http.ListenAndServe(":8080", nil)
}# FastAPI / Starlette expose WebSockets natively (the `websockets` lib is an alternative).
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
app = FastAPI()
@app.websocket("/ws")
async def ws_handler(ws: WebSocket):
# Validate origin yourself before accepting (§14); then complete the handshake.
await ws.accept() # performs the 101 handshake (§4)
try:
while True: # the READ LOOP — one coroutine per connection
msg = await ws.receive_text() # awaits the next message
await ws.send_text(msg) # echo it straight back
except WebSocketDisconnect:
pass # client closed or connection died (§6)
# FastAPI cleans up the connection when the coroutine returns
# uvicorn app:app — uvicorn speaks the WebSocket protocol for youGo gives each connection a goroutine running a blocking read loop; Python gives each connection an async coroutine awaiting messages. Same structure — accept, loop reading, write — expressed in each language's concurrency model.
A subtle but critical rule (especially in Go with gorilla): concurrent writes to one connection are not safe, and you typically want one goroutine reading and another writing. The standard pattern is a dedicated writer goroutine fed by a channel, so all writes are serialized. Don't write to the same connection from multiple goroutines without synchronization — it corrupts the frame stream. The hub pattern (§10) builds exactly this structure.
A WebSocket Client
A backend is often a WebSocket client too — consuming a real-time feed from another service,
bridging systems, or in tests. The client side: dial the ws:///wss:// URL (which does
the handshake), then read and write messages. The shape mirrors the server.
// github.com/gorilla/websocket
func runClient() {
// Dial performs the upgrade handshake; header carries auth where allowed (§13).
h := http.Header{"Authorization": {"Bearer " + token}}
conn, _, err := websocket.DefaultDialer.Dial("wss://api.example.com/ws", h)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// reader goroutine
go func() {
for {
_, msg, err := conn.ReadMessage()
if err != nil {
return
}
log.Printf("recv: %s", msg)
}
}()
// send a message
conn.WriteMessage(websocket.TextMessage, []byte(`{"type":"hello"}`))
// graceful close: send a close frame, then the socket shuts (§6)
conn.WriteMessage(websocket.CloseMessage,
websocket.FormatCloseMessage(websocket.CloseNormalClosure, "bye"))
}# `websockets` library — clean async client
import asyncio, websockets
async def run_client():
# extra_headers carries auth where the runtime allows it (§13);
# browsers can't set headers, but a backend client can.
async with websockets.connect(
"wss://api.example.com/ws",
extra_headers={"Authorization": f"Bearer {token}"},
) as ws: # context manager → handshake + clean close
await ws.send('{"type":"hello"}') # send a message
async for msg in ws: # iterate incoming messages
print("recv:", msg)
# leaving the `async with` block sends a close frame and shuts down (§6)
asyncio.run(run_client())When your service is the client, treat the feed like any unreliable upstream: the connection will drop (§6), so wrap the connect-and-read in a reconnect loop with backoff (§17), and make processing idempotent if the feed can re-deliver on reconnect (the same at-least-once discipline as the Kafka chapter). A backend WebSocket client without reconnection logic silently stops receiving the first time the network hiccups.
Connection Management
One connection is trivial; the real work is managing many. A server holding thousands of live connections needs a central place that tracks who's connected so it can route and broadcast messages. The canonical solution is the hub (or "connection manager"): a single owner of the set of active connections, with channels/locks to register, unregister, and send — sidestepping the concurrent-write hazard from §8 by funneling everything through one coordinator.
type Hub struct {
clients map[*Client]bool
register chan *Client
unregister chan *Client
broadcast chan []byte
}
// One goroutine owns the map → all mutation is serialized here (no locks needed).
func (h *Hub) Run() {
for {
select {
case c := <-h.register:
h.clients[c] = true
case c := <-h.unregister:
if _, ok := h.clients[c]; ok {
delete(h.clients, c)
close(c.send) // tell the client's writer goroutine to stop
}
case msg := <-h.broadcast:
for c := range h.clients {
select {
case c.send <- msg: // queue into the client's buffered channel
default: // buffer full → slow client; drop it (§12)
delete(h.clients, c)
close(c.send)
}
}
}
}
}import asyncio
class Hub:
def __init__(self):
self.clients: set[WebSocket] = set() # the active connection set
async def register(self, ws: WebSocket):
await ws.accept()
self.clients.add(ws)
def unregister(self, ws: WebSocket):
self.clients.discard(ws) # idempotent removal
async def broadcast(self, message: str):
dead = []
for ws in self.clients:
try:
await ws.send_text(message) # fan out to everyone
except Exception:
dead.append(ws) # send failed → connection is gone
for ws in dead:
self.unregister(ws) # clean up on the way out
hub = Hub()
# (asyncio is single-threaded, so the set needs no lock; just don't mutate
# it while iterating — collect dead ones, remove after.)Two things separate a toy from a real server. Concurrency safety: never touch the shared connection set from many goroutines/tasks without coordination (Go: one owner goroutine + per-client send channel; Python asyncio: single-threaded, but don't mutate while iterating). Cleanup: every path that ends a connection must remove it from the registry and free its resources, or you leak (§19). Register on connect, always unregister on disconnect — including error and panic paths.
Broadcasting & Rooms
Real apps rarely send to everyone — they send to a relevant subset: the members of one chat room, the subscribers to one stock symbol, the players in one game. The pattern is rooms (a.k.a. channels or topics): connections subscribe to named groups, and you broadcast to a group rather than the whole server. Concretely, the hub holds a map from room name to the set of connections in it.
On a single server, a room is simply map[roomName] → set of connections, and broadcasting
iterates that set. The catch arrives with multiple servers (§15): a room's members may be spread across
different instances, so a message produced on one server must reach members connected to another. The
in-memory room map alone can't do that — which is the whole reason the backplane exists. Build rooms
simply first; reach for the backplane when you outgrow one instance.
Backpressure & Slow Consumers
A failure mode unique to push systems: what happens when you produce messages faster than a client can receive them? A phone on a weak connection, a browser tab throttled in the background — the client drains slowly while the server keeps generating data for it. Without a plan, that data piles up in a per-client buffer that grows without bound, and one slow client can exhaust the server's memory. This is backpressure, and you must decide a policy.
The standard implementation (visible in the hub of §10): give each client a bounded send queue. When you try to enqueue and it's full, you don't block the broadcaster (that would let one slow client stall everyone) — instead you apply a policy:
- Drop messages — for lossy data where only the latest matters (a live position, a ticker), discard the overflow or keep only the newest. The slow client misses some updates but the server stays healthy.
- Disconnect the client — for data where gaps are unacceptable, close the slow connection and let it reconnect and re-sync (§17). Better to drop one client than degrade all.
- Conflate / batch — collapse multiple pending updates into one (send the latest snapshot rather than every intermediate step).
The cardinal rule of push systems: never buffer unboundedly per connection. Every client gets a fixed-size queue and an explicit overflow policy (drop, disconnect, or conflate). This is the real-time analogue of Kafka's slow-consumer handling and the backpressure concerns in any streaming system — one slow consumer must never be able to take down the producer.
Authentication
Authenticating a WebSocket is trickier than a normal request because of a browser limitation from §4: the
JavaScript WebSocket constructor can't set custom headers, so the usual
Authorization: Bearer isn't available from a browser. You authenticate at the handshake
(it's still an HTTP request — the server can read cookies, the URL, and the subprotocol) and you do it
before calling Upgrade/accept. The principle from the auth chapter
holds: verify identity at the door; an unauthenticated socket should never be upgraded.
| Approach | How | Notes |
|---|---|---|
| Cookie / session | Browser sends the auth cookie on the handshake automatically | Natural for same-site web apps; pair with strict origin checks (§14) to resist CSWSH |
| Token in query string | wss://host/ws?token=JWT | Works everywhere, but the token can land in logs/history — use short-lived tokens |
| Subprotocol field | Smuggle the token in Sec-WebSocket-Protocol | Avoids the URL; a common JWT-over-WS trick |
| First message | Connect, then send an auth message before anything else | Flexible; the server must reject/close if auth doesn't arrive promptly |
| Header (non-browser) | Backend clients can set Authorization (§9) | Cleanest — but only for non-browser clients |
func wsHandler(w http.ResponseWriter, r *http.Request) {
// AUTHENTICATE FIRST — before upgrading. Reject with a normal HTTP error.
token := r.URL.Query().Get("token") // or read a cookie / subprotocol
user, err := verifyJWT(token) // your auth logic (auth chapter)
if err != nil {
http.Error(w, "unauthorized", http.StatusUnauthorized) // 401, no upgrade
return
}
// Only now perform the upgrade; attach the identity to the connection.
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
return
}
serve(conn, user) // every message from this conn is now tied to `user`
}@app.websocket("/ws")
async def ws_handler(ws: WebSocket):
# AUTHENTICATE FIRST — before accept(). Close with a policy code if it fails.
token = ws.query_params.get("token") # or a cookie / subprotocol
user = verify_jwt(token) # your auth logic (auth chapter)
if user is None:
await ws.close(code=1008) # 1008 = policy violation; no accept
return
await ws.accept() # only now complete the handshake
await serve(ws, user) # messages are tied to `user`Authentication at the handshake establishes who the connection belongs to. But a long-lived socket outlives a single action, so you must still authorize each message against what that user is allowed to do (can they post to this room? send this command?) — the same authz-on- every-action rule as REST. Also consider that a token can expire mid-connection: long sessions need a re-auth or token-refresh strategy, or you'll have a connection authenticated by a credential that's no longer valid.
Security
WebSockets inherit the web's threat model plus a few of their own. The security chapter's principles apply directly; these are the WebSocket-specific essentials.
| Concern | What & why |
|---|---|
Use wss:// (TLS) | Plain ws:// is unencrypted — readable and tamperable on the wire. Always use wss:// in production, exactly as HTTPS for HTTP (TLS, the HTTP chapter §20). |
Validate Origin (CSWSH) | The browser sends an Origin header on the handshake; the server must check it. Skipping this enables Cross-Site WebSocket Hijacking — a malicious site opening an authenticated socket using the victim's cookies (the WS cousin of CSRF). Allowlist your own origins. |
| Validate every message | Treat inbound frames as untrusted input (the validation chapter): check structure, types, sizes; reject malformed messages. The connection being authenticated doesn't make its payloads safe. |
| Cap message size | Set a max frame/message size so a client can't send a giant payload to exhaust memory (close with 1009 if exceeded). |
| Rate-limit | A persistent connection can flood you with messages; rate-limit per connection/user to prevent abuse and accidental loops. |
| Bound connections per user | Limit concurrent connections per identity so one client can't open thousands and exhaust the connection table (§19). |
CheckOrigin is a trap
The single most common WebSocket security hole: leaving origin checking disabled. Gorilla's default
CheckOrigin rejects cross-origin requests, but countless tutorials tell you to override it with
return true — which opens you to Cross-Site WebSocket Hijacking for any cookie-authenticated
endpoint. Always allowlist the specific origins you trust (as in the §8 server), never
blanket-allow. If you authenticate with bearer tokens rather than cookies the risk is lower, but origin
validation is still the right default.
Scaling — the Backplane
This is the defining hard problem of WebSockets, and the reason they complicate architecture far more than stateless HTTP. A WebSocket connection is stateful and pinned to one server: the socket lives in the memory of the specific instance the client connected to. The moment you run more than one instance — which you must, for capacity and availability — a painful question appears: if user A is connected to server 1 and user B (in the same chat room) is connected to server 2, how does A's message reach B? Server 1 has no access to server 2's connections.
The solution is a backplane (a.k.a. pub/sub adapter): an external message system that all server instances connect to. When a server needs to deliver a message to a client that may be on any instance, it publishes the message to the backplane; every instance is subscribed, receives it, and forwards it to whichever of its local connections should get it. The instances no longer need to know about each other's connections — they coordinate through the shared bus. Common backplanes:
- Redis Pub/Sub — the most common choice: simple, fast, low-latency, fire-and-forget. Perfect when you only need live fan-out and don't need history (a disconnected user simply misses messages).
- Kafka — when you also want durability, replay, and the events to feed other systems (the Kafka chapter): publish events to a topic, every WS instance consumes and forwards. Heavier, but the messages become part of your durable event stream.
- NATS / cloud pub/sub — other fast messaging options with similar mechanics.
The backplane turns "N servers each with their own islands of connections" into "N servers that collectively behave as one." It's the same decoupling idea as the Kafka chapter applied to live delivery: a publisher doesn't need to know which instance hosts a recipient — it publishes, and whoever holds the connection delivers. You cannot horizontally scale WebSockets without something playing this role; designing it in early is far easier than retrofitting it.
Load Balancing & Sticky Sessions
Putting a load balancer in front of WebSocket servers has its own wrinkles, because the connection is long-lived and stateful rather than a quick request. Three things matter: the LB must support WebSockets, connections usually need affinity, and deploys must drain connections gracefully.
- The LB must speak WebSocket. It has to pass through the
Upgradehandshake and then keep the connection open (an L7/HTTP-aware proxy, or L4 TCP pass-through). Most modern LBs (Nginx, HAProxy, cloud ALBs, the Kubernetes Ingress controllers of chapter 21) support this, but you configure longer idle timeouts — a default 60s idle timeout will kill quiet WebSockets, which is another reason for heartbeats (§7). - Sticky sessions / affinity. Once a client is connected to instance 1, all its frames must keep going to instance 1 (that's where its socket lives). With raw TCP that's automatic (one connection = one backend), but reconnections and any per-message routing want session affinity so a client returns to a consistent instance. With a backplane (§15) affinity matters less for delivery (any instance can publish/receive), but it still affects local state.
- Connection draining on deploy. The big one: deploying a new version (the rolling updates
of chapter 21 §18) terminates instances, and every WebSocket on a terminating instance drops.
You must drain gracefully — stop accepting new connections, send a close frame (code
1001"going away") so clients reconnect cleanly to a healthy instance, and allow time beforeSIGKILL(graceful shutdown chapter). Even so, a deploy causes a reconnection storm — which is why robust client reconnection (§17) is mandatory.
Unlike stateless HTTP — where a rolling deploy is invisible — rolling a WebSocket fleet drops every
live connection as old instances retire. This is fundamental to stateful connections, not a bug. Plan for it:
graceful drain with 1001 on the server, exponential-backoff reconnection with jitter on the
client (§17, to avoid a thundering-herd reconnect), and message design that tolerates a brief gap and
re-sync. Treat reconnection as the normal case, not the exception.
Reconnection & Resilience
Given everything above — connections drop silently (§6), deploys disconnect everyone (§16), networks are flaky — a real-time client that doesn't automatically reconnect is broken by design. Resilience lives mostly on the client, with server-side support for re-syncing missed state. (Note: SSE gives you reconnection for free — another reason to prefer it when you only need server push, §2.)
The resilience checklist
- Reconnect with exponential backoff + jitter. Don't hammer a struggling server with instant retries; grow the delay (1s, 2s, 4s, … capped) and add randomness so thousands of clients disconnected by a deploy don't all reconnect at the same instant (the thundering-herd / reconnect-storm problem from §16).
- Re-sync after reconnect. A new connection means a possible gap in the stream. On reconnect, the client should reconcile: fetch current state via a normal HTTP call, or tell the server "last event I saw was N, send me everything since" (a resume token / sequence number). This is where a durable backplane like Kafka (§15) shines — the missed messages are still in the log to replay.
- Decide your delivery guarantee. Like Kafka (§9), WebSocket delivery isn't magically exactly-once. For critical data, give messages sequence numbers/IDs so the client can detect gaps and duplicates, and make handling idempotent. For lossy data (live positions), a gap is fine — the next update corrects it.
- Buffer outbound on the client while disconnected (within limits), and flush on reconnect — so a user's action taken during a blip isn't simply lost.
The connection dropping is normal operation, not an error case — so reconnection logic isn't a nice-to-have, it's core functionality you build and test deliberately. A real-time system's reliability is mostly determined by how gracefully it handles the constant churn of connections coming and going. Design for the drop.
WebSockets vs SSE vs Streaming vs Polling
The decision that should come before you build anything real-time. WebSockets are powerful but carry the operational weight of Part IV (stateful connections, backplane, sticky sessions, reconnection), so picking the simplest tool that fits is a real cost saving. Four contenders, drawing together the spectrum (§2) and the gRPC chapter.
| Need | Best fit | Why |
|---|---|---|
| Rare updates, latency tolerant | Polling | Trivial, stateless, no infra. Don't over-engineer. |
| Server → browser push, one-way | SSE | Just HTTP, auto-reconnect, sails through proxies; far simpler than WS. Feeds, notifications, live dashboards. |
| Browser ↔ server, frequent both ways | WebSocket | The only option for true low-latency bidirectional in a browser. Chat, games, collaborative editing, trading UIs. |
| Service ↔ service streaming | gRPC streaming | Typed, efficient, HTTP/2-native (gRPC chapter). For internal backends, not browsers. |
Move left until it no longer works. Need server push to a browser but not client→server chatter? SSE, not WebSockets — you skip the entire backplane/sticky-session burden of Part IV. Genuinely need frequent bidirectional browser traffic? WebSockets. Two backend services? gRPC streaming. Reaching for WebSockets when SSE or even polling would do is one of the most common and costly over- engineering mistakes in real-time work.
Common Pitfalls
The failure modes that bite WebSocket servers in production — most are about resource management for long-lived, stateful connections, a discipline stateless HTTP never demanded.
| Pitfall | What happens | Fix |
|---|---|---|
| Connection / goroutine leaks | A disconnect path forgets to unregister & free; goroutines/tasks and memory accumulate until the server dies | Always clean up on every exit path (defer/finally); register on connect, unregister on disconnect (§10) |
| No heartbeat → zombies | Dead connections look idle (§6); the table fills with sockets to clients that vanished | Ping on a timer; drop on missed pong (§7) |
| Unbounded send buffers | One slow client's queue grows without limit and OOMs the server | Bounded per-client queue + drop/disconnect policy (§12) |
| Concurrent writes to one conn | Multiple goroutines writing the same socket corrupt the frame stream | One writer goroutine fed by a channel (§8, §10) |
| Blocking the read loop | Doing slow work inline in the read loop stalls that connection (and trips heartbeats) | Hand work to a worker; keep the read loop fast (the queue idea, ch.10) |
| No origin check | Cross-Site WebSocket Hijacking on cookie-auth endpoints (§14) | Allowlist origins; never blanket-allow |
| Forgetting the backplane | Works on one instance; messages vanish between users once you scale out (§15) | Design the pub/sub backplane in early |
| No reconnection logic | Client silently stops receiving after the first network blip or deploy | Reconnect with backoff + jitter, then re-sync (§17) |
Notice the pattern: nearly every pitfall is a resource that wasn't bounded or freed — a leaked connection, an unbounded buffer, a zombie socket. Stateless HTTP forgives sloppiness here because each request is short-lived and self-cleaning; a WebSocket lives for hours and holds memory the whole time. Treat every connection as something you must explicitly account for from open to close, and most production problems never appear.
Designing Real-Time Systems
Pulling the manual into design judgment for anything real-time.
Design principles
- Choose the lightest transport first. Polling → SSE → WebSocket → gRPC streaming, leftmost that fits (§18). Don't pay for WebSocket complexity unless you need bidirectional browser traffic.
- Design the backplane in from day one if you'll ever run more than one instance — which is always, in production (§15). Retrofitting cross-instance delivery into a single-instance design is painful.
- Treat disconnection as normal. Heartbeats to detect it (§7), graceful drain on deploy (§16), client reconnection with backoff + jitter and state re-sync (§17). The connection will churn; build for it.
- Bound everything per connection. Send-buffer size, message size, message rate, connections per user (§12, §14, §19). Unbounded anything is an outage waiting for one bad client.
- Don't assume reliable, ordered, exactly-once delivery. Add sequence numbers/IDs for gap detection and idempotent handling where it matters; accept loss where it doesn't (§17, echoing Kafka §9).
- Authenticate at the handshake, authorize every message, secure the channel.
wss://, origin checks, validated inbound messages (§13–14). - Keep the read loop fast; offload real work to a queue/worker (ch.10), and serialize writes through one writer per connection (§8).
Reach for WebSockets for genuinely interactive, low-latency, bidirectional browser experiences: chat, multiplayer games, collaborative editing, live trading/betting interfaces, real-time dashboards with client interaction. Don't reach for them for one-way server push (use SSE), for occasional updates (polling), for request/response (plain HTTP/REST — the HTTP chapter), or for service-to-service streaming (gRPC). The skill, as always, is matching the transport to the actual interaction shape rather than reaching for the most powerful tool by reflex.
Cheat-Sheet
The whole manual compressed to what you reach for under pressure.
| Concept | One-liner |
|---|---|
| Why WebSockets | HTTP can't push; WS is a persistent full-duplex pipe either side can send on anytime. |
| Spectrum | Polling → long-polling → SSE (one-way) → WebSocket (two-way). Pick the simplest. |
| What it is | One long-lived TCP connection, started via HTTP, on ports 80/443. ws:// / wss://. |
| Handshake | HTTP GET + Upgrade: websocket → 101 Switching Protocols. Not 200. |
| Frames | FIN, opcode (text/binary/close/ping/pong), client→server masked; message-oriented, not byte-stream. |
| Lifecycle | open → messages → close (with a code). 1006 = died with no close frame. |
| Heartbeats | Ping/pong on a timer; missed pong = dead → close + clean up. Always run them. |
| Server | Accept upgrade, then a read loop; library does framing (gorilla / FastAPI / websockets). |
| Concurrency | One writer per connection; never write the same socket from many goroutines. |
| Hub | Central registry owns connections; register/unregister/broadcast. Always clean up. |
| Rooms | Group connections by name; broadcast to a group, not everyone. |
| Backpressure | Bounded per-client buffer; on overflow drop / disconnect / conflate. Never buffer unbounded. |
| Auth | At the handshake (cookie/query/subprotocol/first msg) — browsers can't set headers. Authz every message. |
| Security | wss:// + validate Origin (CSWSH) + validate messages + size/rate limits. |
| Backplane | Connections are pinned per instance; Redis/Kafka pub/sub lets servers deliver across instances. |
| Load balancing | LB must speak WS; raise idle timeouts; affinity; drain (1001) on deploy. |
| Deploys | Every rolling deploy disconnects all clients — reconnection is mandatory. |
| Reconnection | Backoff + jitter, then re-sync missed state (resume token / HTTP fetch). |
| Delivery | Not exactly-once; add sequence IDs + idempotency where gaps matter. |
| vs others | SSE for one-way push; gRPC streaming for service↔service; polling for rare updates. |
| Pitfalls | Leaks, zombies, unbounded buffers, concurrent writes, no origin check, no backplane. |
| Design rule | Lightest transport that fits; bound everything; treat disconnection as normal. |
The whole topic in one breath: WebSockets exist because HTTP can't let the server push, and
polling is wasteful (§1) — but they're the heaviest option on a spectrum that includes polling and
one-way SSE (§2). A WebSocket is a single persistent full-duplex TCP connection (§3) that begins as
an HTTP Upgrade handshake answered with 101 (§4), then exchanges
message-oriented frames (§5) through an open→message→close lifecycle (§6), kept honest by
ping/pong heartbeats since dead connections look idle (§7). You build a server as accept-then-read-loop
(§8), often act as a client too (§9), and manage many connections through a central hub (§10)
with rooms for grouping (§11) and bounded buffers for backpressure so one slow client can't sink the
server (§12). In production you authenticate at the handshake and authorize every message (§13),
secure with wss:// and origin checks (§14), and — the hard part — scale stateful
connections across instances with a pub/sub backplane (§15), behind a WebSocket-aware load balancer that
drains on deploy (§16), with clients that reconnect with backoff and re-sync (§17). Above all: pick
the lightest transport that fits (§18), avoid the resource-leak pitfalls (§19), and treat
disconnection as the normal case (§20).
Grounded in MDN (developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) & RFC 6455 · gorilla/websocket & the websockets/FastAPI docs · Go 1.22+ / Python 3.11+ examples.