A detailed backend reference

Containers, shipping your code
and its world together.

A first-principles walkthrough of how backend code gets packaged and run in production — from the kernel features that make a container (namespaces and cgroups), through Docker images, layers and multi-stage builds, into Kubernetes orchestration (pods, deployments, services, probes, autoscaling), and out to CI/CD pipelines and zero-downtime rollouts. Written to explain not just what each piece does but why it exists and how it works underneath. Application examples in both Go and Python.

Linux namespaces + cgroups Docker / OCI Kubernetes 21 sections
Part I · Why Containers Exist
01

The Problem Containers Solve

Every container concept later in this manual is an answer to one recurring pain: code that runs on one machine fails on another because the two machines aren't identical. Your laptop has Python 3.11, the server has 3.9; your build assumes a system library that prod lacks; an environment variable is set here and unset there. The result is the oldest joke in software — "but it works on my machine" — and it is not a joke, it's a class of outages.

The root cause is that a running program is never just its source code. It's the code plus a whole invisible environment: the language runtime and its exact version, third-party libraries, system packages, configuration, file paths, and OS-level dependencies. Traditionally you reproduced that environment by hand on every machine — install steps, setup scripts, a wiki page nobody kept current — and any drift between environments became a bug that only appears in one place.

Before containers, deploying software was like mailing a recipe and hoping the recipient's kitchen has the same oven, the same pans, and the same brand of flour. Containers mail the entire kitchen — oven, pans, ingredients, pre-measured — sealed in a box that runs the same way wherever you open it.
The deployment gap
A program is code plus an environment — and the environment is what drifts
WITHOUT containers — same code, different worlds your laptop app + Python 3.11 libs A, B (v2) ✓ works the server app + Python 3.9 libs A (v1), no B ✗ breaks "works on my machine" WITH a container — the environment travels with the code container image app + runtime + libs + config laptop ✓ staging ✓ prod ✓
A container packages the application and its entire runtime environment into one immutable artifact, so "the thing that runs" is byte-identical everywhere.

Three problems, one solution

  • Dependency hell & drift — the exact runtime and libraries are baked in, so there's no "install steps" to get wrong and no version skew between environments.
  • Isolation — two apps that need conflicting versions of the same library can run side by side on one host, each in its own sealed box, without interfering.
  • Density & portability — containers are light enough to pack many onto a single machine and to move freely between laptop, CI, and any cloud — the foundation that orchestration (Part IV) and modern CI/CD (Part V) build on.
The one idea

A container makes "the thing that runs in production" identical to "the thing you built and tested." Everything else — images, Docker, Kubernetes, pipelines — is machinery for building, shipping, and running that identical artifact at scale.

02

Containers vs Virtual Machines

The instinctive question is "isn't this just a virtual machine?" The answer reveals what containers actually are. A VM virtualizes hardware: a hypervisor emulates a whole machine, and each VM runs its own complete operating system — its own kernel, on top of which sit its libraries and your app. A container virtualizes the operating system: all containers on a host share the host's single kernel, and isolation is enforced by kernel features rather than by emulating separate hardware.

The decisive difference
VMs each ship a full OS; containers share one kernel
Virtual Machines Containers App A bins / libs Guest OS + kernel App B bins / libs Guest OS + kernel Hypervisor Host OS + kernel Physical hardware App A bins/libs no kernel App B bins/libs no kernel App C bins/libs no kernel Container runtime (e.g. containerd) ONE shared Host OS kernel Physical hardware No per-app guest OS means containers start in milliseconds, weigh megabytes not gigabytes, and you can pack far more of them onto the same hardware.
A VM's unit of isolation is a whole emulated computer; a container's is a process group fenced off by the kernel. That single architectural choice explains every practical difference below.
DimensionVirtual MachineContainer
Isolates byEmulating hardware (hypervisor)Kernel features (namespaces + cgroups)
OS per instanceFull guest OS + own kernelShares the host kernel; no guest OS
SizeGigabytesMegabytes
Start timeSeconds to minutes (boot an OS)Milliseconds (start a process)
Density per hostTensHundreds to thousands
Isolation strengthStronger (separate kernels)Weaker (shared kernel) — a real tradeoff
The shared-kernel tradeoff

Sharing one kernel is what makes containers light and is their main security caveat: a kernel-level escape affects every container on the host, whereas VMs have a thicker boundary. This is why multi-tenant platforms often run containers inside VMs, and why container security (§17, and the security chapter) focuses on least privilege, non-root users, and dropping capabilities. Containers are isolation, not a hard security sandbox by default.

03

What a Container Actually Is

Demystified: a container is just a normal Linux process (or a few) that the kernel has been told to isolate. There is no "container" object in the kernel — the magic is three older Linux features combined so a process believes it has a machine to itself. Understanding these three removes all the mystery and explains every limit and behavior you'll hit.

The three ingredients
A container = namespaces + cgroups + a filesystem image
Namespaces WHAT IT CAN SEE · pid — its own process tree· net — its own interfaces· mnt — its own mounts· uts/ipc/user — hostname, … cgroups WHAT IT CAN USE · CPU shares / quota· memory limit· I/O bandwidth· pids count Union FS WHAT IT RUNS ON · the image's layers· read-only base + libs· thin writable top layer· its private root “/” Combine the three and an ordinary process is convinced it owns the machine — that illusion is a container.

1 · Namespaces — the illusion of being alone

A namespace partitions a global kernel resource so the process only sees its own slice. The PID namespace makes the container's main process believe it's PID 1 with no siblings; the network namespace gives it its own interfaces and ports; the mount namespace gives it its own view of the filesystem. From inside, it looks like a private machine. From the host, it's just processes with extra labels — ps on the host sees them all.

2 · cgroups — enforcing limits

Namespaces hide the rest of the system; control groups (cgroups) cap how much the process may consume — CPU, memory, I/O, process count. This is what stops one container from starving its neighbors, and it's exactly the mechanism Kubernetes drives when you set resource requests and limits (§17). Exceed a memory limit and the kernel's OOM killer terminates the container — the famous OOMKilled status.

3 · A filesystem image — its own root

Finally the process needs files: its own / with a runtime, libraries, and your binary. That comes from the image, layered onto the mount namespace as the container's root filesystem (§04). The process can't see the host's files, only the image's — plus whatever you explicitly mount in (§09).

Why "Linux containers" specifically

Namespaces and cgroups are Linux kernel features, so containers are natively a Linux technology. Docker on macOS or Windows quietly runs a lightweight Linux VM and your containers live inside it — which is why a "container" on your Mac is really Linux-in-a-VM under the hood. The artifact is portable; the kernel features it needs are Linux's.

Part II · Docker — Images & Containers
04

Images, Layers & the Union Filesystem

Two words get confused constantly, so nail them first. An image is the immutable, on-disk template — a packaged filesystem plus metadata (what to run, which ports, env). A container is a running instance of an image — the image brought to life as an isolated process (§03). The relationship is exactly class–to–object: one image, many containers.

An image is to a container what a class is to an object, or what a baked, frozen meal is to the same meal reheated on your plate. You build the image once; you run many identical containers from it.

Images are stacks of read-only layers

The clever part is how an image stores its filesystem: as a stack of layers, each one a set of file changes, stacked by a union filesystem that merges them into a single coherent /. Each instruction in your build adds one layer (§05). Layers are immutable and content-addressed, so identical layers are stored once and shared across images — if ten images use the same base, that base lives on disk and travels over the network only once.

Union filesystem
Read-only layers shared across containers, each with its own thin writable top
IMAGE = read-only layers (shared) L4 COPY app binary +2 MB L3 pip/go install deps +40 MB L2 apt-get runtime libs +25 MB L1 base image (e.g. debian) 50 MB these four layers are stored once & shared CONTAINERS add a writable top writable (C1) writable (C2) both share the same read-only layers below copy-on-write A container reads straight from the shared layers. The instant it modifies a file, that file is copied up into its own writable layer — the image underneath is never touched. That writable layer is ephemeral: delete the container and those changes vanish (§09 for persistence).
Layer sharing is why pulling your second image based on the same runtime is nearly instant, and why image size is dominated by your base choice (§07).
The writable layer is not storage

Anything a container writes to its own filesystem dies with the container. Logs, uploads, database files written "inside" are gone on restart. Persistent data must live in a volume (§09) or an external service. This ephemerality is a feature — it's what makes containers disposable and replaceable — but it's the most common beginner trap.

05

The Dockerfile

A Dockerfile is the recipe that builds an image: a sequence of instructions, executed top to bottom, each producing one layer (§04). docker build reads it, runs each step, and caches the result. Because the file is plain text in your repo, your build is reproducible and reviewable — the environment becomes code. Here is the same web service containerized in Go and in Python:

# FROM picks the base image — the first (bottom) layer. Pin a version, never :latest
FROM golang:1.22

# WORKDIR sets the working dir for the instructions that follow (and at runtime)
WORKDIR /app

# Copy dependency manifests FIRST and download — this layer caches across code edits (§7)
COPY go.mod go.sum ./
RUN go mod download

# Now copy the source and build the binary
COPY . .
RUN go build -o /server ./cmd/server

# Document the port the app listens on (metadata; publishing happens at run time §8)
EXPOSE 8080

# CMD is the default command run when a container starts from this image
CMD ["/server"]
# Pin a slim base — smaller and fewer CVEs than the full image
FROM python:3.11-slim

WORKDIR /app

# Copy the manifest FIRST so the dependency layer caches across code edits (§7)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Then copy the application code
COPY . .

EXPOSE 8080

# Exec form (JSON array) — runs the process directly as PID 1, so signals work (§ graceful shutdown)
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
InstructionWhat it does
FROMThe base image — the starting layer. Everything builds on it.
WORKDIRSets (and creates) the current directory for later steps and at runtime.
COPY / ADDCopy files from your build context into the image. Prefer COPY; ADD has surprising URL/tar behavior.
RUNExecute a command at build time (install deps, compile) — bakes the result into a layer.
ENVSet environment variables baked into the image (config defaults — §16).
EXPOSEDocuments the listening port. Informational; doesn't publish it (§08).
ENTRYPOINT / CMDWhat runs when the container starts. ENTRYPOINT = the executable, CMD = default args.
Build context & RUN vs CMD

Two frequent confusions. The build context is the directory you point docker build at — everything in it is sent to the builder, so a bloated context (e.g. node_modules, a .git folder) slows builds; trim it with .dockerignore (§07). And RUN executes at build time (baked into the image) while CMD/ENTRYPOINT execute at run time (when a container starts). Mixing them up is a classic early mistake.

06

Multi-Stage Builds

The single most impactful technique for production images. The problem: building needs heavy tools (compilers, the full SDK, build-time dev dependencies) that the running app doesn't. Ship them and your image is huge and full of attack surface. A multi-stage build uses one stage to build and a second, clean stage that copies only the finished artifact — the toolchain is discarded.

Multi-stage
Build in a heavy stage, copy only the artifact into a tiny final image
STAGE 1 · builder (discarded) golang:1.22 / python full + compiler / full SDK+ build dependencies+ your source tree+ intermediate objects /server (the only keeper) ~900 MB — thrown away after build COPY --from STAGE 2 · final (shipped) distroless / alpine /server ~15 MB — binary + minimal runtime only
# ---- Stage 1: build with the full Go toolchain ----
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# static binary, no libc dependency — perfect for a scratch/distroless final image
RUN CGO_ENABLED=0 go build -o /server ./cmd/server

# ---- Stage 2: tiny runtime, no compiler, non-root ----
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /server /server   # copy ONLY the artifact across stages
EXPOSE 8080
USER nonroot:nonroot                  # never run as root (§17, security)
ENTRYPOINT ["/server"]
# Result: a ~10-15 MB image with no shell, no package manager, minimal CVEs
# ---- Stage 1: install deps into a venv with build tools available ----
FROM python:3.11 AS builder
WORKDIR /app
RUN python -m venv /venv
ENV PATH="/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt   # may compile native wheels here

# ---- Stage 2: slim runtime, copy the ready venv, run non-root ----
FROM python:3.11-slim
COPY --from=builder /venv /venv          # copy ONLY the built virtualenv
ENV PATH="/venv/bin:$PATH"
WORKDIR /app
COPY . .
RUN useradd -m app && chown -R app /app
USER app                                 # non-root (§17)
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Go compiles to a single static binary, so the final stage can be near-empty (distroless/scratch). Python needs the interpreter, so the win comes from copying a pre-built virtualenv into a -slim base instead of carrying the compilers.

Why this matters in production

Smaller images pull faster (quicker autoscaling and rollouts — §18), cost less to store and transfer, and shrink the attack surface dramatically — a distroless image has no shell for an attacker to use. Multi-stage builds are the default for any serious service image.

07

Image Optimization & Layer Caching

Two levers make builds fast and images lean, and both follow directly from the layer model (§04). The first is cache ordering; the second is base-image choice plus pruning what you copy in.

Order instructions from least- to most-frequently-changed

Docker caches each layer and reuses it on the next build as long as that instruction and everything before it are unchanged. The moment one layer's inputs change, that layer and every layer after it are rebuilt. So the golden rule is: put the things that rarely change (installing dependencies) before the things that change every commit (copying your source). That's why every Dockerfile above copies the dependency manifest and installs before copying the full source — you edit code constantly but dependencies rarely, so the expensive install layer stays cached.

Cache invalidation
Change one layer and everything below it rebuilds
deps BEFORE code — good COPY . . + build ↻ rebuilt RUN install deps ✓ CACHED COPY manifest ✓ CACHED FROM base ✓ CACHED edit code → only the top layer rebuilds (seconds) code BEFORE deps — bad RUN install deps ↻ rebuilt COPY . . ↻ busts cache FROM base ✓ CACHED edit code → deps reinstall every time (minutes)

Pick a small base & copy less

  • Base image-slim variants drop hundreds of MB; alpine is tiny (musl libc, watch for compatibility); distroless ships only your app and its runtime with no shell or package manager (most secure). Choose the smallest that runs your app.
  • .dockerignore — keeps junk out of the build context and the image (faster builds, smaller images, no secrets leaking in).
  • Combine related RUNs & clean up in the same layer — deleting a cache in a later layer doesn't shrink the image (the bytes still live in the earlier layer); clean within the same RUN.
.dockerignore — trim the build context
.git
node_modules
*.md
.env            # never bake secrets into an image (§16)
**/__pycache__
*.log
dist/
.vscode/
Cleanup must be in the same layer

RUN apt-get install ... on one line and RUN rm -rf /var/lib/apt/lists/* on the next does not reduce size — the files still exist in the install layer. Chain them: RUN apt-get update && apt-get install -y X && rm -rf /var/lib/apt/lists/* so the cleanup happens before the layer is sealed.

08

Runtime, Registries & Tags

You've built an image — now run it, share it, and version it. Three commands cover 90% of daily use, and three concepts (registry, tag, runtime) frame the lifecycle.

the everyday Docker lifecycle
# BUILD an image from the Dockerfile in . and name:tag it
docker build -t myapp:1.4.0 .

# RUN a container from it:
#   -d detached  -p host:container publishes a port  -e sets an env var (§16)
docker run -d -p 8080:8080 -e DATABASE_URL=$DB myapp:1.4.0

# Inspect & debug
docker ps                      # running containers
docker logs -f <id>            # stream stdout/stderr (your app should log there §logging)
docker exec -it <id> sh        # shell INTO a running container to poke around
docker stop <id>               # sends SIGTERM, then SIGKILL after a grace period (§graceful shutdown)

# SHARE via a registry
docker tag  myapp:1.4.0 registry.example.com/team/myapp:1.4.0
docker push registry.example.com/team/myapp:1.4.0     # upload
docker pull registry.example.com/team/myapp:1.4.0     # download (what prod/K8s does)

Registry, repository, tag

A registry is the server that stores images (Docker Hub, GitHub Container Registry, AWS ECR, Google Artifact Registry). A repository is a named image within it; a tag is a label for a specific version (myapp:1.4.0). Pushing uploads your layers; pulling downloads them — and thanks to content-addressed layers (§04), only layers the registry doesn't already have move over the wire. This push/pull cycle is the hand-off between your build (§19) and where the image runs (Part IV).

The image lifecycle
Build once, push to a registry, pull anywhere to run
docker buildDockerfile push Registrystores tagged layers pull CI runs tests staging production (K8s)
Never deploy :latest

latest is just a tag that moves; it doesn't mean "newest" and gives you no idea what is running. Two machines pulling :latest a day apart can get different images — the exact non-reproducibility containers exist to kill. Tag with an immutable version or the git commit SHA (myapp:1.4.0, myapp:git-9f2c1a) so a running container maps to exact source.

OCI: the standard underneath

"Docker image" is shorthand — the format is standardized by the Open Container Initiative (OCI), which is why images built with Docker run under other runtimes (containerd, CRI-O, Podman) and on Kubernetes without Docker installed. You build to a spec, not to one vendor.

Part III · Multi-Container & Local Dev
09

Networking & Volumes

Two needs appear the moment you run more than a toy: containers must talk to each other, and some data must outlive the container. Docker answers these with networks and volumes — and both directly address gaps left by the isolation model (§03) and the ephemeral writable layer (§04).

Networking — from isolated to connected

By default each container has its own network namespace, so it can't see others. Put containers on the same user-defined bridge network and Docker gives them a private virtual LAN plus built-in DNS: a container reaches another simply by its name (http://db:5432). This is the foundation of service discovery, and it's why Compose (§10) and Kubernetes (§15) let services address each other by name instead of brittle IPs. Separately, port publishing (-p 8080:8080) pokes a hole from the host into a container so the outside world can reach it.

Container networking
A user-defined network gives name-based discovery; a published port exposes one service
bridge network: appnet api:8080 db:5432 connect to "db" host :8080 → api :8080 (published) db is NOT published — private to the network

Volumes — data that survives

Because a container's writable layer is destroyed with the container (§04), anything that must persist — a database's files, user uploads — lives in a volume: storage managed by Docker (or the host) and mounted into the container, decoupled from its lifecycle. Delete and recreate the container; the volume and its data remain. There are two flavors: named volumes (Docker-managed, the default for real data) and bind mounts (a host directory mapped in — handy in development to live-edit code without rebuilding).

MechanismUse it for
Named volume -v dbdata:/var/lib/postgresql/dataPersistent app data managed by Docker — databases, caches you want to keep.
Bind mount -v ./src:/app/srcDev-time live reload — edit on the host, see changes in the container instantly.
tmpfsSensitive scratch data kept in RAM only, never written to disk.
Stateless containers, stateful volumes

Treat containers as disposable and keep all durable state in volumes or external managed services. In Kubernetes especially, the prevailing pattern is stateless app containers (trivially scaled and replaced — §18) with state pushed to managed databases or persistent volumes. If killing a container loses data, the design has leaked state into the wrong place.

10

Docker Compose

Running a real app means several containers — API, database, cache, maybe a worker (the task-queue chapter's setup) — with the right networks, volumes, env, and start order. Wiring that by hand with many docker run commands is tedious and error-prone. Docker Compose declares the whole local stack in one YAML file and brings it up with a single command. It's the standard for local development and simple single-host deployments.

compose.yaml — an API (your Go or Python service), Postgres, and Redis
services:
  api:
    build: .                      # build the image from the local Dockerfile (§5/6)
    ports:
      - "8080:8080"               # publish to the host (§9)
    environment:                  # config via env, not baked in (§16)
      DATABASE_URL: postgres://app:secret@db:5432/app   # reach "db" by name (§9 DNS)
      REDIS_URL: redis://cache:6379
    depends_on:
      db:
        condition: service_healthy   # wait until the DB passes its healthcheck
      cache:
        condition: service_started

  db:
    image: postgres:16
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: app
    volumes:
      - dbdata:/var/lib/postgresql/data   # named volume = data survives restarts (§9)
    healthcheck:                          # so depends_on can wait for readiness
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 5s
      retries: 5

  cache:
    image: redis:7

volumes:
  dbdata:                          # declared once, referenced above
running the stack
docker compose up -d        # build (if needed) + start everything in the background
docker compose ps           # status of all services
docker compose logs -f api  # tail one service's logs
docker compose down         # stop & remove containers + network (volumes kept)
docker compose down -v      # ...and delete the volumes too (fresh DB)
Compose vs Kubernetes

Compose is for one host — superb for local dev and small deployments, but it doesn't schedule across machines, self-heal, or autoscale. The instant you need many nodes, automatic restarts, rolling updates, and horizontal scaling, you've outgrown Compose and want an orchestrator (Part IV). The mental model carries over: a Compose service maps closely to a Kubernetes Deployment + Service.

Where we are
From one container to a fleet — the road into orchestration
1 containerdocker run many, one hostdocker compose fleet, many nodesKubernetes → Part IV
Part IV · Kubernetes — Orchestration
11

Why Orchestration

Docker gives you containers; it does not, by itself, run them reliably across many machines. The moment you have dozens of containers on a fleet of servers, a new set of problems appears that no docker run can solve — and orchestration is the layer that solves them. Kubernetes (often "K8s") is the de-facto standard.

The problem at fleet scaleWhat an orchestrator does
Which machine should each container run on?Scheduling — places containers on nodes by available CPU/memory.
A container (or a whole node) dies at 3am.Self-healing — restarts crashed containers, reschedules off dead nodes, automatically.
Traffic spikes; you need more copies, then fewer.Scaling — runs N replicas and adds/removes them on demand (§18).
Containers come and go; their IPs change.Service discovery + load balancing — a stable address that fans out to healthy replicas (§15).
Ship a new version without downtime.Rolling updates & rollback — replace replicas gradually, revert on failure (§18).
Config & secrets differ per environment.Config management — inject settings/secrets without rebuilding the image (§16).

The core idea: declarative, desired state

The mindset shift that makes Kubernetes click: you don't issue imperative commands ("start a container here"). You declare the desired state ("I want 5 replicas of this image, reachable on this address") and Kubernetes continuously works to make reality match. If a container dies, actual drops to 4, and a control loop notices the gap and starts another — without you doing anything. You describe the what; Kubernetes handles the how and keeps it true.

Imperative is telling a driver every turn. Declarative is giving a destination to a self-correcting autopilot: drift off course — a crash, a dead node — and it steers back on its own, continuously, without new instructions.
Reconciliation in one line

Everything in Kubernetes is a control loop comparing desired state (what you declared) to actual state (what's running) and taking action to close the gap. Internalize that loop and the entire system — Deployments, scaling, healing, rollouts — becomes one idea applied over and over.

12

Kubernetes Architecture

A Kubernetes cluster is two kinds of machines: a control plane (the brain that decides what should run) and a set of worker nodes (the muscle where containers actually run). You talk only to the control plane — you submit desired state, and the cluster makes it happen.

Cluster anatomy
Control plane decides; nodes run; the reconciliation loop connects them
CONTROL PLANE — the brain api-serverthe one front door schedulerpicks a node controller-mgrruns the loops etcdthe source of truth all state lives in etcd; everything talks via the api-server WORKER NODE 1 — the muscle kubelet kube-proxy (networking) pod pod pod WORKER NODE 2 pod pod pod you: kubectl applydesired state (YAML) api-server writes your desired state to etcd; controllers + scheduler reconcile it onto nodes; kubelets run the pods. kubelets continuously report actual state back — closing the loop.
ComponentRole
api-serverThe single entry point. Every read/write of cluster state goes through it (your kubectl, every component).
etcdDistributed key-value store holding the entire cluster state — the source of truth.
schedulerDecides which node a new pod runs on, based on resource requests and constraints.
controller-managerRuns the reconciliation loops (the Deployment controller, etc.) that drive actual toward desired.
kubeletThe agent on each node; starts/stops containers and reports their health up to the api-server.
kube-proxyPrograms node networking so Service virtual IPs route to the right pods (§15).
You rarely run this yourself

Managed Kubernetes (GKE, EKS, AKS) operates the control plane for you, so in practice you interact with the cluster purely through the api-server via kubectl and YAML. Knowing the parts still matters for debugging — "pod stuck in Pending" is usually the scheduler finding no node with room (§17).

13

Pods

The pod is Kubernetes' atomic unit — the smallest thing it schedules. Crucially, you do not deploy containers directly; you deploy pods, and a pod wraps one or more containers that are always co-located on the same node and share a network namespace and storage. Containers in a pod reach each other on localhost and can share mounted volumes — they're a single cooperating unit.

Usually a pod holds exactly one container (your app). The multi-container case is the sidecar pattern: a helper alongside the main app — a log shipper, a metrics exporter, or a service-mesh proxy (the Envoy from the gRPC chapter's §19). The sidecar shares the pod's network so it can observe or proxy the app's traffic transparently.

Pod internals
Containers in a pod share one network & can share storage
POD — shares one IP & lifecycle app container:8080 sidecarlog shipper / proxy localhost one pod IP · same node · live and die together
Pods are mortal and disposable

A pod is ephemeral: it can be killed and replaced at any time (node failure, scaling down, a rollout), and the replacement gets a new IP. So you never hard-code a pod's IP and never rely on a specific pod surviving. This is exactly why Services exist (§15) — to provide a stable address in front of a constantly-churning set of pods — and why you let a Deployment manage pods rather than creating them directly (§14).

14

Deployments & ReplicaSets

You almost never create pods by hand — a bare pod isn't replaced if it dies. Instead you declare a Deployment: "keep N replicas of this pod template running, always." The Deployment manages a ReplicaSet, whose one job is to maintain the replica count, and the ReplicaSet manages the pods. This is the reconciliation loop (§11) made concrete: declare 5, a pod dies, actual becomes 4, the controller starts a new one — self-healing, for free.

The ownership chain
Deployment → ReplicaSet → Pods, kept at the desired count
Deploymentreplicas: 3 manages ReplicaSetkeeps count = 3 pod ✓ running pod ✓ running pod ✗ died →controller recreates it Actual fell to 2, desired is 3 → the gap is detected and a fresh pod is started, automatically.
deployment.yaml — declare 3 replicas of your image
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3                 # desired state: always keep 3 pods
  selector:
    matchLabels: { app: api } # which pods this Deployment owns
  template:                   # the POD TEMPLATE stamped out for each replica
    metadata:
      labels: { app: api }
    spec:
      containers:
        - name: api
          image: registry.example.com/team/myapp:1.4.0   # pinned tag, never :latest (§8)
          ports:
            - containerPort: 8080
          # resources, env, and probes go here too — see §16 and §17
driving it with kubectl
kubectl apply -f deployment.yaml     # submit desired state (declarative)
kubectl get pods -l app=api          # see the 3 pods it created
kubectl scale deployment/api --replicas=10   # change desired count → controller adds 7
kubectl rollout status deployment/api        # watch a new version roll out (§18)
kubectl rollout undo  deployment/api          # roll back to the previous version (§18)
Other workload kinds

Deployment is for stateless apps (the common case). Siblings exist for other shapes: StatefulSet for stateful apps needing stable identity/storage (databases), DaemonSet to run one pod per node (log/metrics agents), and Job/CronJob for run-to-completion and scheduled tasks (the batch side of the task-queue chapter). Same reconciliation idea, different guarantees.

15

Services & Ingress

Pods are mortal and their IPs churn (§13), so you can't point traffic at a pod. A Service is the fix: a stable virtual IP and DNS name in front of a dynamic set of pods, with built-in load balancing. It uses a label selector to track "all pods with app: api" — as pods come and go, the Service's set of healthy endpoints updates automatically, and callers never notice. This is Kubernetes' service discovery, and it's why one service addresses another by name (http://api), exactly as Compose did on one host (§9).

Service abstraction
A stable address that load-balances across whatever pods currently match
callerhttp://api Service: apistable IP + DNS · selector app=api pod (app=api) ✓ pod (app=api) ✓ pod removed — dropped Only Ready pods (per readiness probe §17) receive traffic; the rest are pulled from rotation.

Service types & Ingress

A ClusterIP Service (the default) is reachable only inside the cluster — perfect for service-to-service calls. To accept traffic from outside, you either use a LoadBalancer Service (the cloud provisions an external load balancer, one per service — expensive at scale) or, far more commonly for HTTP, an Ingress: a single L7 entry point that routes by host and path to many backing Services. Ingress is where TLS termination and host/path routing live — the cluster's public front door, and the natural home for the REST edge described in the gRPC chapter (§18–19 there).

ExposureReachable fromUse for
ClusterIPInside the cluster onlyService-to-service (most internal services)
NodePortA port on every nodeBasic/dev external access
LoadBalancerThe internet (cloud LB)One TCP/UDP service exposed directly
IngressThe internet, via L7 rulesHTTP host/path routing + TLS for many services through one entry point
service.yaml + ingress.yaml
apiVersion: v1
kind: Service
metadata:
  name: api               # other pods reach this as http://api (cluster DNS)
spec:
  selector: { app: api }  # front the pods labelled app=api (from the Deployment §14)
  ports:
    - port: 80            # the Service port callers use
      targetPort: 8080    # the containerPort it forwards to
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: edge
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api          # route external HTTP to the api Service
                port: { number: 80 }
The full path of a request

External client → Ingress (TLS, host/path routing) → Service (stable IP, load-balance) → a healthy Pod → your container. Each hop adds one concern — routing, discovery, the actual work — and together they're how a packet from the internet reaches one replica of your app.

16

Config, Secrets & Environment

An image must be environment-agnostic: the same tested image runs in dev, staging, and prod — only the configuration differs (database URLs, feature flags, credentials). So configuration is injected at run time, never baked into the image. This is the containerized form of the config-management chapter's core rule, and the classic expression of twelve-factor "config in the environment." Kubernetes provides two objects: ConfigMaps for non-secret settings and Secrets for sensitive values.

Config injection
One image, many environments — config supplied from outside
one tested imagemyapp:1.4.0 dev+ ConfigMap-dev + Secret-dev staging+ ConfigMap-stg + Secret-stg production+ ConfigMap-prod + Secret-prod
configmap + secret, injected as env vars
apiVersion: v1
kind: ConfigMap
metadata: { name: api-config }
data:
  LOG_LEVEL: "info"
  FEATURE_NEW_CHECKOUT: "true"
---
apiVersion: v1
kind: Secret
metadata: { name: api-secrets }
type: Opaque
stringData:                       # plaintext here; stored base64-encoded in etcd
  DATABASE_URL: "postgres://app:secret@db:5432/app"
---
# ...inside the Deployment's container spec (§14):
#   envFrom:
#     - configMapRef: { name: api-config }    # all keys → env vars
#     - secretRef:    { name: api-secrets }   # secret keys → env vars
# Your app then reads os.Getenv("LOG_LEVEL") / os.environ["DATABASE_URL"] (§config chapter)
Secrets are not encrypted by default

A Kubernetes Secret is only base64-encoded in etcd, not encrypted — base64 is encoding, not protection. For real safety enable encryption at rest for etcd, lock down RBAC so few identities can read Secrets, and for sensitive systems use an external manager (Vault, cloud secret stores, or the Secrets Store CSI driver). And never commit real secret values to Git — the same never-leak-credentials discipline as the security and auth chapters.

17

Health Probes & Resource Management

For self-healing and zero-downtime to work, Kubernetes needs to know two things about every pod: is it alive? and is it ready for traffic? It can't guess — your app must tell it, via probes. And to schedule and isolate fairly, it needs to know how much CPU and memory each container wants, via requests and limits (the cgroups of §03, surfaced as config).

Three probes, three questions

  • Liveness — "is the process healthy, or wedged?" If it fails, Kubernetes restarts the container. Catches deadlocks and hung states a crash wouldn't.
  • Readiness — "can it serve requests right now?" If it fails, the pod is pulled from its Service's rotation (§15) but not restarted. This is what enables zero-downtime rollouts and backpressure: a pod warming up or briefly overloaded stops receiving traffic until it recovers.
  • Startup — "has a slow-starting app finished booting?" Guards the other probes so a long cold start isn't mistaken for failure.
Liveness vs readiness
Restart on liveness failure; stop sending traffic on readiness failure
kubelet probes the podGET /healthz · /ready liveness fails→ RESTART the container readiness fails→ remove from Service rotation

Requests & limits

A request is the amount the scheduler reserves for a container (and uses to pick a node); a limit is the hard ceiling the runtime enforces via cgroups. Exceed a CPU limit and the container is throttled; exceed a memory limit and it's killed (OOMKilled). Setting these well is the difference between a stable cluster and noisy-neighbor chaos — and they feed autoscaling (§18).

probes + resources in the container spec
# ...inside the Deployment's containers: entry (§14)
          livenessProbe:
            httpGet: { path: /healthz, port: 8080 }   # wedged? → restart
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet: { path: /ready, port: 8080 }      # ready for traffic? → in/out of rotation
            periodSeconds: 5
          resources:
            requests: { cpu: "100m", memory: "128Mi" } # scheduler reserves this
            limits:   { cpu: "500m", memory: "256Mi" } # cgroup ceiling; over memory → OOMKilled
Probes pair with graceful shutdown

Probes are only half of clean lifecycle handling. The other half is the graceful-shutdown chapter: on a rollout or scale-down, Kubernetes sends SIGTERM and waits a grace period before SIGKILL. Your app should fail its readiness probe immediately (stop new traffic), finish in-flight requests, then exit — so no request is dropped mid-flight. Probes decide what gets traffic; graceful shutdown decides how a pod leaves.

18

Scaling & Rollouts

Two everyday operations close out orchestration: handling more load (scaling) and shipping new versions without downtime (rollouts). Both lean on everything built so far — replicas, Services, readiness probes — and both are the Kubernetes expression of topics from the scaling chapter.

Horizontal scaling & autoscaling

Horizontal scaling means running more pod replicas (preferred over making one pod bigger — the horizontal-vs-vertical theme from the scaling chapter). You can set the count manually, or hand it to the Horizontal Pod Autoscaler (HPA), which watches a metric (typically CPU) and adjusts replicas to hold a target — scaling out under load and back in when it subsides. Because app pods are stateless (§9), any replica can serve any request, so adding pods just works.

hpa.yaml — autoscale on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: api }
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }  # keep avg CPU ~70%
# Requires resource requests (§17) — utilization is measured against the request.

Rolling updates & rollback

When you change the image tag and re-apply, a Deployment performs a rolling update by default: it brings up new-version pods a few at a time, waits for each to pass its readiness probe (§17) before sending it traffic, and only then retires old pods. At every moment enough healthy pods are serving, so users see no downtime. If new pods fail readiness, the rollout stalls instead of taking the app down — and one command rolls back to the last good version.

Rolling update
Replace replicas gradually — always keep a healthy serving set
start v1 v1 v1 mid-rollout v2 v1 v1 (+v2 warming) done v2 v2 v2 A new pod only takes traffic after passing readiness; old pods drain gracefully (§17). If v2 never becomes Ready, the rollout halts — capacity is never sacrificed.
It all comes together here

Zero-downtime deploys aren't one feature — they're the composition of replicas (§14), readiness probes (§17), graceful shutdown, and the Service abstraction (§15) keeping a stable address over a shifting pod set. Each piece earned its place; the rollout is where they pay off.

Part V · CI/CD & Production
19

CI/CD Pipelines

Containers make the artifact reproducible; CI/CD makes the path from a code commit to that artifact running in production automatic. CI (Continuous Integration) is the build-and-verify half — on every push, build the image, run tests, scan it. CD (Continuous Delivery/Deployment) is the ship half — push the image to a registry and roll it out. The whole point is to remove manual, error-prone steps so deploys are frequent, small, and boring.

The pipeline
From commit to running pod, with gates along the way
commitgit push buildimage §6 testunit/integ scanCVEs pushregistry §8 deployrollout §18 Each stage is a gate: a failed test or a critical CVE stops the pipeline before anything reaches production.
.github/workflows/deploy.yaml — a representative pipeline
name: ci-cd
on:
  push:
    branches: [main]

jobs:
  build-test-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run tests              # CI gate — must pass before we ship (§testing chapter)
        run: make test               # go test ./...  |  pytest

      - name: Build image
        run: docker build -t $REGISTRY/myapp:${{ github.sha }} .   # tag = commit SHA (§8)

      - name: Scan image for vulnerabilities
        run: trivy image --severity HIGH,CRITICAL --exit-code 1 $REGISTRY/myapp:${{ github.sha }}

      - name: Push to registry
        run: |
          echo "$REGISTRY_TOKEN" | docker login $REGISTRY -u ci --password-stdin
          docker push $REGISTRY/myapp:${{ github.sha }}

      - name: Deploy to Kubernetes   # update the Deployment's image → triggers a rolling update (§18)
        run: kubectl set image deployment/api api=$REGISTRY/myapp:${{ github.sha }}
Why tag by commit SHA

Tagging the image with the git SHA (and deploying that exact tag) gives a precise, immutable link from a running pod back to the source that built it — the reproducibility goal from §8, end to end. When something breaks in prod you know exactly which commit is running, and rollback (§18) is unambiguous.

20

Deployment Strategies & GitOps

The rolling update (§18) is the default, but riskier releases call for strategies that limit blast radius, and mature teams change how deploys are triggered altogether with GitOps.

StrategyHow it worksTrade-off
RollingReplace pods gradually (the default)Simple, zero-downtime; old + new run briefly together
Blue-greenStand up the full new version (green) beside old (blue), then flip all traffic at onceInstant switch & instant rollback; needs double the capacity during the cutover
CanarySend a small % of traffic to the new version, watch metrics, then ramp upSafest for risky changes; more orchestration/observability needed
Blue-green vs canary
Flip all at once, or trickle traffic and watch
blue-green — flip 100% blue v1 green v2 switch router blue→green instantly problem? flip back to blue at once canary — trickle then ramp v1 — 95% v2 — 5% watch v2's error rate & latency healthy → 5% → 25% → 100% bad → route 0% back, blast radius tiny

GitOps — Git as the source of truth

The pipeline in §19 pushes changes to the cluster. GitOps inverts that: the desired state of the cluster lives entirely in a Git repository (all those YAML manifests), and an in-cluster agent (Argo CD, Flux) continuously pulls from Git and reconciles the cluster to match. This is the reconciliation loop of §11 extended to deployments themselves: Git is desired state, the cluster is actual state, the agent closes the gap. You deploy by merging a pull request; you roll back with git revert; and the repo is a complete, audited history of every change to production.

The throughline

Notice the same idea at three scales: a container keeps the artifact identical, a Deployment keeps the running pods matching a spec, and GitOps keeps the whole cluster matching Git. Declare desired state, let a loop reconcile reality — it's containers and Kubernetes all the way down.

21

Production Cheat-Sheet

The whole manual compressed to what you reach for under pressure.

ConceptOne-liner
ContainerAn isolated process — namespaces (what it sees) + cgroups (what it uses) + an image filesystem.
vs VMContainers share one host kernel; VMs each ship a full OS. Lighter, faster, weaker isolation.
Image vs containerImage = immutable template (class); container = running instance (object).
LayersImages are stacked read-only layers, shared & cached; the container's writable top is ephemeral.
DockerfileRecipe; each instruction = a layer. RUN at build time, CMD at run time.
Multi-stageBuild in a heavy stage, copy only the artifact into a tiny final image.
Cache orderDeps before code — a changed layer rebuilds everything below it.
Registry & tagsPush/pull images by name:tag; pin a version or SHA, never :latest.
VolumesData that must survive lives in a volume, not the container's writable layer.
ComposeDeclarative multi-container stack on one host — local dev.
OrchestrationScheduling, self-healing, scaling, discovery, rollouts across many nodes.
DeclarativeDeclare desired state; a control loop reconciles actual toward it. The whole model.
PodSmallest unit; one+ containers sharing network & storage. Mortal, gets a new IP when replaced.
DeploymentKeeps N replicas of a pod running; self-heals; does rolling updates.
ServiceStable IP/DNS load-balancing across matching pods — service discovery.
IngressOne L7 entry point: host/path routing + TLS to many Services.
Config/SecretsInject at run time (ConfigMap/Secret); same image everywhere. Secrets aren't encrypted by default.
ProbesLiveness → restart; readiness → in/out of rotation; startup → guard slow boots.
Requests/limitsRequest = reserved (scheduling); limit = cgroup ceiling. Over memory → OOMKilled.
ScalingRun more replicas; HPA autoscales on a metric (needs requests set).
Rollout/rollbackGradual replace gated by readiness; kubectl rollout undo reverts.
CI/CDCommit → build → test → scan → push → deploy, automatically; tag by SHA.
StrategiesRolling (default), blue-green (instant flip), canary (trickle & watch).
GitOpsGit is desired state; an agent pulls & reconciles. Deploy by PR, roll back by revert.

The whole topic in one breath: a container packages your app with its entire environment so "what runs in prod" equals "what you tested" (§1) — achieved with Linux namespaces and cgroups, not a heavy VM (§2–3). You build an image as cached, shareable layers (§4) from a Dockerfile (§5), shrink and harden it with multi-stage builds and good cache ordering (§6–7), then push a SHA-tagged image to a registry (§8). Locally you wire several containers with networks, volumes, and Compose (§9–10). At fleet scale, Kubernetes reconciles declared desired state onto nodes (§11–12): Deployments keep replicas of pods alive and self-healing (§13–14), Services and Ingress give stable addressing and routing (§15), ConfigMaps/Secrets inject per-environment config (§16), probes and resource requests/limits drive health and scheduling (§17), and HPA plus rolling updates deliver scaling and zero-downtime releases (§18). CI/CD automates commit→build→test→ scan→push→deploy (§19), and blue-green/canary/GitOps make releases safe and auditable (§20). One idea recurs at every layer: declare the desired state and let a loop keep reality matching it.

Grounded in the Docker & Kubernetes docs · OCI image-spec · the Twelve-Factor App · Go 1.22+ / Python 3.11+ application examples.