Containers, shipping your code
and its world together.
A first-principles walkthrough of how backend code gets packaged and run in production — from the kernel features that make a container (namespaces and cgroups), through Docker images, layers and multi-stage builds, into Kubernetes orchestration (pods, deployments, services, probes, autoscaling), and out to CI/CD pipelines and zero-downtime rollouts. Written to explain not just what each piece does but why it exists and how it works underneath. Application examples in both Go and Python.
The Problem Containers Solve
Every container concept later in this manual is an answer to one recurring pain: code that runs on one machine fails on another because the two machines aren't identical. Your laptop has Python 3.11, the server has 3.9; your build assumes a system library that prod lacks; an environment variable is set here and unset there. The result is the oldest joke in software — "but it works on my machine" — and it is not a joke, it's a class of outages.
The root cause is that a running program is never just its source code. It's the code plus a whole invisible environment: the language runtime and its exact version, third-party libraries, system packages, configuration, file paths, and OS-level dependencies. Traditionally you reproduced that environment by hand on every machine — install steps, setup scripts, a wiki page nobody kept current — and any drift between environments became a bug that only appears in one place.
Three problems, one solution
- Dependency hell & drift — the exact runtime and libraries are baked in, so there's no "install steps" to get wrong and no version skew between environments.
- Isolation — two apps that need conflicting versions of the same library can run side by side on one host, each in its own sealed box, without interfering.
- Density & portability — containers are light enough to pack many onto a single machine and to move freely between laptop, CI, and any cloud — the foundation that orchestration (Part IV) and modern CI/CD (Part V) build on.
A container makes "the thing that runs in production" identical to "the thing you built and tested." Everything else — images, Docker, Kubernetes, pipelines — is machinery for building, shipping, and running that identical artifact at scale.
Containers vs Virtual Machines
The instinctive question is "isn't this just a virtual machine?" The answer reveals what containers actually are. A VM virtualizes hardware: a hypervisor emulates a whole machine, and each VM runs its own complete operating system — its own kernel, on top of which sit its libraries and your app. A container virtualizes the operating system: all containers on a host share the host's single kernel, and isolation is enforced by kernel features rather than by emulating separate hardware.
| Dimension | Virtual Machine | Container |
|---|---|---|
| Isolates by | Emulating hardware (hypervisor) | Kernel features (namespaces + cgroups) |
| OS per instance | Full guest OS + own kernel | Shares the host kernel; no guest OS |
| Size | Gigabytes | Megabytes |
| Start time | Seconds to minutes (boot an OS) | Milliseconds (start a process) |
| Density per host | Tens | Hundreds to thousands |
| Isolation strength | Stronger (separate kernels) | Weaker (shared kernel) — a real tradeoff |
Sharing one kernel is what makes containers light and is their main security caveat: a kernel-level escape affects every container on the host, whereas VMs have a thicker boundary. This is why multi-tenant platforms often run containers inside VMs, and why container security (§17, and the security chapter) focuses on least privilege, non-root users, and dropping capabilities. Containers are isolation, not a hard security sandbox by default.
What a Container Actually Is
Demystified: a container is just a normal Linux process (or a few) that the kernel has been told to isolate. There is no "container" object in the kernel — the magic is three older Linux features combined so a process believes it has a machine to itself. Understanding these three removes all the mystery and explains every limit and behavior you'll hit.
1 · Namespaces — the illusion of being alone
A namespace partitions a global kernel resource so the process only sees its own slice. The
PID namespace makes the container's main process believe it's PID 1 with no siblings; the network namespace
gives it its own interfaces and ports; the mount namespace gives it its own view of the filesystem. From
inside, it looks like a private machine. From the host, it's just processes with extra labels —
ps on the host sees them all.
2 · cgroups — enforcing limits
Namespaces hide the rest of the system; control groups (cgroups) cap how much the process may
consume — CPU, memory, I/O, process count. This is what stops one container from starving its neighbors,
and it's exactly the mechanism Kubernetes drives when you set resource requests and limits
(§17). Exceed a memory limit and the kernel's OOM killer terminates the container — the famous
OOMKilled status.
3 · A filesystem image — its own root
Finally the process needs files: its own / with a runtime, libraries, and your binary. That comes
from the image, layered onto the mount namespace as the container's root filesystem (§04).
The process can't see the host's files, only the image's — plus whatever you explicitly mount in (§09).
Namespaces and cgroups are Linux kernel features, so containers are natively a Linux technology. Docker on macOS or Windows quietly runs a lightweight Linux VM and your containers live inside it — which is why a "container" on your Mac is really Linux-in-a-VM under the hood. The artifact is portable; the kernel features it needs are Linux's.
Images, Layers & the Union Filesystem
Two words get confused constantly, so nail them first. An image is the immutable, on-disk template — a packaged filesystem plus metadata (what to run, which ports, env). A container is a running instance of an image — the image brought to life as an isolated process (§03). The relationship is exactly class–to–object: one image, many containers.
Images are stacks of read-only layers
The clever part is how an image stores its filesystem: as a stack of layers, each
one a set of file changes, stacked by a union filesystem that merges them into a single
coherent /. Each instruction in your build adds one layer (§05). Layers are immutable and
content-addressed, so identical layers are stored once and shared across images —
if ten images use the same base, that base lives on disk and travels over the network only once.
Anything a container writes to its own filesystem dies with the container. Logs, uploads, database files written "inside" are gone on restart. Persistent data must live in a volume (§09) or an external service. This ephemerality is a feature — it's what makes containers disposable and replaceable — but it's the most common beginner trap.
The Dockerfile
A Dockerfile is the recipe that builds an image: a sequence of instructions, executed top to
bottom, each producing one layer (§04). docker build reads it, runs each step, and
caches the result. Because the file is plain text in your repo, your build is reproducible and reviewable —
the environment becomes code. Here is the same web service containerized in Go and in Python:
# FROM picks the base image — the first (bottom) layer. Pin a version, never :latest
FROM golang:1.22
# WORKDIR sets the working dir for the instructions that follow (and at runtime)
WORKDIR /app
# Copy dependency manifests FIRST and download — this layer caches across code edits (§7)
COPY go.mod go.sum ./
RUN go mod download
# Now copy the source and build the binary
COPY . .
RUN go build -o /server ./cmd/server
# Document the port the app listens on (metadata; publishing happens at run time §8)
EXPOSE 8080
# CMD is the default command run when a container starts from this image
CMD ["/server"]# Pin a slim base — smaller and fewer CVEs than the full image
FROM python:3.11-slim
WORKDIR /app
# Copy the manifest FIRST so the dependency layer caches across code edits (§7)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Then copy the application code
COPY . .
EXPOSE 8080
# Exec form (JSON array) — runs the process directly as PID 1, so signals work (§ graceful shutdown)
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]| Instruction | What it does |
|---|---|
FROM | The base image — the starting layer. Everything builds on it. |
WORKDIR | Sets (and creates) the current directory for later steps and at runtime. |
COPY / ADD | Copy files from your build context into the image. Prefer COPY; ADD has surprising URL/tar behavior. |
RUN | Execute a command at build time (install deps, compile) — bakes the result into a layer. |
ENV | Set environment variables baked into the image (config defaults — §16). |
EXPOSE | Documents the listening port. Informational; doesn't publish it (§08). |
ENTRYPOINT / CMD | What runs when the container starts. ENTRYPOINT = the executable, CMD = default args. |
RUN vs CMD
Two frequent confusions. The build context is the directory you point docker build
at — everything in it is sent to the builder, so a bloated context (e.g. node_modules, a
.git folder) slows builds; trim it with .dockerignore (§07). And RUN
executes at build time (baked into the image) while CMD/ENTRYPOINT
execute at run time (when a container starts). Mixing them up is a classic early mistake.
Multi-Stage Builds
The single most impactful technique for production images. The problem: building needs heavy tools (compilers, the full SDK, build-time dev dependencies) that the running app doesn't. Ship them and your image is huge and full of attack surface. A multi-stage build uses one stage to build and a second, clean stage that copies only the finished artifact — the toolchain is discarded.
# ---- Stage 1: build with the full Go toolchain ----
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# static binary, no libc dependency — perfect for a scratch/distroless final image
RUN CGO_ENABLED=0 go build -o /server ./cmd/server
# ---- Stage 2: tiny runtime, no compiler, non-root ----
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /server /server # copy ONLY the artifact across stages
EXPOSE 8080
USER nonroot:nonroot # never run as root (§17, security)
ENTRYPOINT ["/server"]
# Result: a ~10-15 MB image with no shell, no package manager, minimal CVEs# ---- Stage 1: install deps into a venv with build tools available ----
FROM python:3.11 AS builder
WORKDIR /app
RUN python -m venv /venv
ENV PATH="/venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt # may compile native wheels here
# ---- Stage 2: slim runtime, copy the ready venv, run non-root ----
FROM python:3.11-slim
COPY --from=builder /venv /venv # copy ONLY the built virtualenv
ENV PATH="/venv/bin:$PATH"
WORKDIR /app
COPY . .
RUN useradd -m app && chown -R app /app
USER app # non-root (§17)
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]Go compiles to a single static binary, so the final stage can be near-empty (distroless/scratch).
Python needs the interpreter, so the win comes from copying a pre-built virtualenv into a -slim
base instead of carrying the compilers.
Smaller images pull faster (quicker autoscaling and rollouts — §18), cost less to store and transfer, and shrink the attack surface dramatically — a distroless image has no shell for an attacker to use. Multi-stage builds are the default for any serious service image.
Image Optimization & Layer Caching
Two levers make builds fast and images lean, and both follow directly from the layer model (§04). The first is cache ordering; the second is base-image choice plus pruning what you copy in.
Order instructions from least- to most-frequently-changed
Docker caches each layer and reuses it on the next build as long as that instruction and everything before it are unchanged. The moment one layer's inputs change, that layer and every layer after it are rebuilt. So the golden rule is: put the things that rarely change (installing dependencies) before the things that change every commit (copying your source). That's why every Dockerfile above copies the dependency manifest and installs before copying the full source — you edit code constantly but dependencies rarely, so the expensive install layer stays cached.
Pick a small base & copy less
- Base image —
-slimvariants drop hundreds of MB; alpine is tiny (musl libc, watch for compatibility); distroless ships only your app and its runtime with no shell or package manager (most secure). Choose the smallest that runs your app. .dockerignore— keeps junk out of the build context and the image (faster builds, smaller images, no secrets leaking in).- Combine related
RUNs & clean up in the same layer — deleting a cache in a later layer doesn't shrink the image (the bytes still live in the earlier layer); clean within the sameRUN.
.git
node_modules
*.md
.env # never bake secrets into an image (§16)
**/__pycache__
*.log
dist/
.vscode/
RUN apt-get install ... on one line and RUN rm -rf /var/lib/apt/lists/* on the next
does not reduce size — the files still exist in the install layer. Chain them:
RUN apt-get update && apt-get install -y X && rm -rf /var/lib/apt/lists/* so the
cleanup happens before the layer is sealed.
Runtime, Registries & Tags
You've built an image — now run it, share it, and version it. Three commands cover 90% of daily use, and three concepts (registry, tag, runtime) frame the lifecycle.
# BUILD an image from the Dockerfile in . and name:tag it
docker build -t myapp:1.4.0 .
# RUN a container from it:
# -d detached -p host:container publishes a port -e sets an env var (§16)
docker run -d -p 8080:8080 -e DATABASE_URL=$DB myapp:1.4.0
# Inspect & debug
docker ps # running containers
docker logs -f <id> # stream stdout/stderr (your app should log there §logging)
docker exec -it <id> sh # shell INTO a running container to poke around
docker stop <id> # sends SIGTERM, then SIGKILL after a grace period (§graceful shutdown)
# SHARE via a registry
docker tag myapp:1.4.0 registry.example.com/team/myapp:1.4.0
docker push registry.example.com/team/myapp:1.4.0 # upload
docker pull registry.example.com/team/myapp:1.4.0 # download (what prod/K8s does)
Registry, repository, tag
A registry is the server that stores images (Docker Hub, GitHub Container Registry, AWS ECR,
Google Artifact Registry). A repository is a named image within it; a tag
is a label for a specific version (myapp:1.4.0). Pushing uploads your layers; pulling downloads
them — and thanks to content-addressed layers (§04), only layers the registry doesn't already have
move over the wire. This push/pull cycle is the hand-off between your build (§19) and where the image
runs (Part IV).
:latest
latest is just a tag that moves; it doesn't mean "newest" and gives you no idea what is
running. Two machines pulling :latest a day apart can get different images — the exact
non-reproducibility containers exist to kill. Tag with an immutable version or the git commit SHA
(myapp:1.4.0, myapp:git-9f2c1a) so a running container maps to exact source.
"Docker image" is shorthand — the format is standardized by the Open Container Initiative (OCI), which is why images built with Docker run under other runtimes (containerd, CRI-O, Podman) and on Kubernetes without Docker installed. You build to a spec, not to one vendor.
Networking & Volumes
Two needs appear the moment you run more than a toy: containers must talk to each other, and some data must outlive the container. Docker answers these with networks and volumes — and both directly address gaps left by the isolation model (§03) and the ephemeral writable layer (§04).
Networking — from isolated to connected
By default each container has its own network namespace, so it can't see others. Put containers on the same
user-defined bridge network and Docker gives them a private virtual LAN plus built-in
DNS: a container reaches another simply by its name (http://db:5432). This is the foundation of
service discovery, and it's why Compose (§10) and Kubernetes (§15) let services address each other by
name instead of brittle IPs. Separately, port publishing (-p 8080:8080) pokes a
hole from the host into a container so the outside world can reach it.
Volumes — data that survives
Because a container's writable layer is destroyed with the container (§04), anything that must persist — a database's files, user uploads — lives in a volume: storage managed by Docker (or the host) and mounted into the container, decoupled from its lifecycle. Delete and recreate the container; the volume and its data remain. There are two flavors: named volumes (Docker-managed, the default for real data) and bind mounts (a host directory mapped in — handy in development to live-edit code without rebuilding).
| Mechanism | Use it for |
|---|---|
Named volume -v dbdata:/var/lib/postgresql/data | Persistent app data managed by Docker — databases, caches you want to keep. |
Bind mount -v ./src:/app/src | Dev-time live reload — edit on the host, see changes in the container instantly. |
| tmpfs | Sensitive scratch data kept in RAM only, never written to disk. |
Treat containers as disposable and keep all durable state in volumes or external managed services. In Kubernetes especially, the prevailing pattern is stateless app containers (trivially scaled and replaced — §18) with state pushed to managed databases or persistent volumes. If killing a container loses data, the design has leaked state into the wrong place.
Docker Compose
Running a real app means several containers — API, database, cache, maybe a worker (the task-queue
chapter's setup) — with the right networks, volumes, env, and start order. Wiring that by hand with many
docker run commands is tedious and error-prone. Docker Compose declares the whole
local stack in one YAML file and brings it up with a single command. It's the standard for local development and
simple single-host deployments.
services:
api:
build: . # build the image from the local Dockerfile (§5/6)
ports:
- "8080:8080" # publish to the host (§9)
environment: # config via env, not baked in (§16)
DATABASE_URL: postgres://app:secret@db:5432/app # reach "db" by name (§9 DNS)
REDIS_URL: redis://cache:6379
depends_on:
db:
condition: service_healthy # wait until the DB passes its healthcheck
cache:
condition: service_started
db:
image: postgres:16
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: app
volumes:
- dbdata:/var/lib/postgresql/data # named volume = data survives restarts (§9)
healthcheck: # so depends_on can wait for readiness
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 5s
retries: 5
cache:
image: redis:7
volumes:
dbdata: # declared once, referenced above
docker compose up -d # build (if needed) + start everything in the background
docker compose ps # status of all services
docker compose logs -f api # tail one service's logs
docker compose down # stop & remove containers + network (volumes kept)
docker compose down -v # ...and delete the volumes too (fresh DB)
Compose is for one host — superb for local dev and small deployments, but it doesn't
schedule across machines, self-heal, or autoscale. The instant you need many nodes, automatic restarts,
rolling updates, and horizontal scaling, you've outgrown Compose and want an orchestrator (Part IV). The
mental model carries over: a Compose service maps closely to a Kubernetes Deployment + Service.
Why Orchestration
Docker gives you containers; it does not, by itself, run them reliably across many machines. The moment you
have dozens of containers on a fleet of servers, a new set of problems appears that no docker run
can solve — and orchestration is the layer that solves them. Kubernetes (often "K8s")
is the de-facto standard.
| The problem at fleet scale | What an orchestrator does |
|---|---|
| Which machine should each container run on? | Scheduling — places containers on nodes by available CPU/memory. |
| A container (or a whole node) dies at 3am. | Self-healing — restarts crashed containers, reschedules off dead nodes, automatically. |
| Traffic spikes; you need more copies, then fewer. | Scaling — runs N replicas and adds/removes them on demand (§18). |
| Containers come and go; their IPs change. | Service discovery + load balancing — a stable address that fans out to healthy replicas (§15). |
| Ship a new version without downtime. | Rolling updates & rollback — replace replicas gradually, revert on failure (§18). |
| Config & secrets differ per environment. | Config management — inject settings/secrets without rebuilding the image (§16). |
The core idea: declarative, desired state
The mindset shift that makes Kubernetes click: you don't issue imperative commands ("start a container here"). You declare the desired state ("I want 5 replicas of this image, reachable on this address") and Kubernetes continuously works to make reality match. If a container dies, actual drops to 4, and a control loop notices the gap and starts another — without you doing anything. You describe the what; Kubernetes handles the how and keeps it true.
Everything in Kubernetes is a control loop comparing desired state (what you declared) to actual state (what's running) and taking action to close the gap. Internalize that loop and the entire system — Deployments, scaling, healing, rollouts — becomes one idea applied over and over.
Kubernetes Architecture
A Kubernetes cluster is two kinds of machines: a control plane (the brain that decides what should run) and a set of worker nodes (the muscle where containers actually run). You talk only to the control plane — you submit desired state, and the cluster makes it happen.
| Component | Role |
|---|---|
| api-server | The single entry point. Every read/write of cluster state goes through it (your kubectl, every component). |
| etcd | Distributed key-value store holding the entire cluster state — the source of truth. |
| scheduler | Decides which node a new pod runs on, based on resource requests and constraints. |
| controller-manager | Runs the reconciliation loops (the Deployment controller, etc.) that drive actual toward desired. |
| kubelet | The agent on each node; starts/stops containers and reports their health up to the api-server. |
| kube-proxy | Programs node networking so Service virtual IPs route to the right pods (§15). |
Managed Kubernetes (GKE, EKS, AKS) operates the control plane for you, so in practice you interact with the
cluster purely through the api-server via kubectl and YAML. Knowing the parts still matters for
debugging — "pod stuck in Pending" is usually the scheduler finding no node with room (§17).
Pods
The pod is Kubernetes' atomic unit — the smallest thing it schedules. Crucially, you do
not deploy containers directly; you deploy pods, and a pod wraps one or more containers that
are always co-located on the same node and share a network namespace and storage. Containers
in a pod reach each other on localhost and can share mounted volumes — they're a single
cooperating unit.
Usually a pod holds exactly one container (your app). The multi-container case is the sidecar pattern: a helper alongside the main app — a log shipper, a metrics exporter, or a service-mesh proxy (the Envoy from the gRPC chapter's §19). The sidecar shares the pod's network so it can observe or proxy the app's traffic transparently.
A pod is ephemeral: it can be killed and replaced at any time (node failure, scaling down, a rollout), and the replacement gets a new IP. So you never hard-code a pod's IP and never rely on a specific pod surviving. This is exactly why Services exist (§15) — to provide a stable address in front of a constantly-churning set of pods — and why you let a Deployment manage pods rather than creating them directly (§14).
Deployments & ReplicaSets
You almost never create pods by hand — a bare pod isn't replaced if it dies. Instead you declare a Deployment: "keep N replicas of this pod template running, always." The Deployment manages a ReplicaSet, whose one job is to maintain the replica count, and the ReplicaSet manages the pods. This is the reconciliation loop (§11) made concrete: declare 5, a pod dies, actual becomes 4, the controller starts a new one — self-healing, for free.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3 # desired state: always keep 3 pods
selector:
matchLabels: { app: api } # which pods this Deployment owns
template: # the POD TEMPLATE stamped out for each replica
metadata:
labels: { app: api }
spec:
containers:
- name: api
image: registry.example.com/team/myapp:1.4.0 # pinned tag, never :latest (§8)
ports:
- containerPort: 8080
# resources, env, and probes go here too — see §16 and §17
kubectl apply -f deployment.yaml # submit desired state (declarative)
kubectl get pods -l app=api # see the 3 pods it created
kubectl scale deployment/api --replicas=10 # change desired count → controller adds 7
kubectl rollout status deployment/api # watch a new version roll out (§18)
kubectl rollout undo deployment/api # roll back to the previous version (§18)
Deployment is for stateless apps (the common case). Siblings exist for other shapes: StatefulSet for stateful apps needing stable identity/storage (databases), DaemonSet to run one pod per node (log/metrics agents), and Job/CronJob for run-to-completion and scheduled tasks (the batch side of the task-queue chapter). Same reconciliation idea, different guarantees.
Services & Ingress
Pods are mortal and their IPs churn (§13), so you can't point traffic at a pod. A Service
is the fix: a stable virtual IP and DNS name in front of a dynamic set of pods, with built-in load
balancing. It uses a label selector to track "all pods with app: api" — as
pods come and go, the Service's set of healthy endpoints updates automatically, and callers never notice. This
is Kubernetes' service discovery, and it's why one service addresses another by name (http://api),
exactly as Compose did on one host (§9).
Service types & Ingress
A ClusterIP Service (the default) is reachable only inside the cluster — perfect for
service-to-service calls. To accept traffic from outside, you either use a LoadBalancer
Service (the cloud provisions an external load balancer, one per service — expensive at scale) or, far
more commonly for HTTP, an Ingress: a single L7 entry point that routes by host and path to
many backing Services. Ingress is where TLS termination and host/path routing live — the cluster's
public front door, and the natural home for the REST edge described in the gRPC chapter (§18–19 there).
| Exposure | Reachable from | Use for |
|---|---|---|
ClusterIP | Inside the cluster only | Service-to-service (most internal services) |
NodePort | A port on every node | Basic/dev external access |
LoadBalancer | The internet (cloud LB) | One TCP/UDP service exposed directly |
| Ingress | The internet, via L7 rules | HTTP host/path routing + TLS for many services through one entry point |
apiVersion: v1
kind: Service
metadata:
name: api # other pods reach this as http://api (cluster DNS)
spec:
selector: { app: api } # front the pods labelled app=api (from the Deployment §14)
ports:
- port: 80 # the Service port callers use
targetPort: 8080 # the containerPort it forwards to
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: edge
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api # route external HTTP to the api Service
port: { number: 80 }
External client → Ingress (TLS, host/path routing) → Service (stable IP, load-balance) → a healthy Pod → your container. Each hop adds one concern — routing, discovery, the actual work — and together they're how a packet from the internet reaches one replica of your app.
Config, Secrets & Environment
An image must be environment-agnostic: the same tested image runs in dev, staging, and prod — only the configuration differs (database URLs, feature flags, credentials). So configuration is injected at run time, never baked into the image. This is the containerized form of the config-management chapter's core rule, and the classic expression of twelve-factor "config in the environment." Kubernetes provides two objects: ConfigMaps for non-secret settings and Secrets for sensitive values.
apiVersion: v1
kind: ConfigMap
metadata: { name: api-config }
data:
LOG_LEVEL: "info"
FEATURE_NEW_CHECKOUT: "true"
---
apiVersion: v1
kind: Secret
metadata: { name: api-secrets }
type: Opaque
stringData: # plaintext here; stored base64-encoded in etcd
DATABASE_URL: "postgres://app:secret@db:5432/app"
---
# ...inside the Deployment's container spec (§14):
# envFrom:
# - configMapRef: { name: api-config } # all keys → env vars
# - secretRef: { name: api-secrets } # secret keys → env vars
# Your app then reads os.Getenv("LOG_LEVEL") / os.environ["DATABASE_URL"] (§config chapter)
A Kubernetes Secret is only base64-encoded in etcd, not encrypted — base64 is encoding, not protection. For real safety enable encryption at rest for etcd, lock down RBAC so few identities can read Secrets, and for sensitive systems use an external manager (Vault, cloud secret stores, or the Secrets Store CSI driver). And never commit real secret values to Git — the same never-leak-credentials discipline as the security and auth chapters.
Health Probes & Resource Management
For self-healing and zero-downtime to work, Kubernetes needs to know two things about every pod: is it alive? and is it ready for traffic? It can't guess — your app must tell it, via probes. And to schedule and isolate fairly, it needs to know how much CPU and memory each container wants, via requests and limits (the cgroups of §03, surfaced as config).
Three probes, three questions
- Liveness — "is the process healthy, or wedged?" If it fails, Kubernetes restarts the container. Catches deadlocks and hung states a crash wouldn't.
- Readiness — "can it serve requests right now?" If it fails, the pod is pulled from its Service's rotation (§15) but not restarted. This is what enables zero-downtime rollouts and backpressure: a pod warming up or briefly overloaded stops receiving traffic until it recovers.
- Startup — "has a slow-starting app finished booting?" Guards the other probes so a long cold start isn't mistaken for failure.
Requests & limits
A request is the amount the scheduler reserves for a container (and uses to pick a node); a
limit is the hard ceiling the runtime enforces via cgroups. Exceed a CPU limit and the
container is throttled; exceed a memory limit and it's killed (OOMKilled).
Setting these well is the difference between a stable cluster and noisy-neighbor chaos — and they feed
autoscaling (§18).
# ...inside the Deployment's containers: entry (§14)
livenessProbe:
httpGet: { path: /healthz, port: 8080 } # wedged? → restart
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet: { path: /ready, port: 8080 } # ready for traffic? → in/out of rotation
periodSeconds: 5
resources:
requests: { cpu: "100m", memory: "128Mi" } # scheduler reserves this
limits: { cpu: "500m", memory: "256Mi" } # cgroup ceiling; over memory → OOMKilled
Probes are only half of clean lifecycle handling. The other half is the graceful-shutdown chapter: on a
rollout or scale-down, Kubernetes sends SIGTERM and waits a grace period before
SIGKILL. Your app should fail its readiness probe immediately (stop new traffic), finish
in-flight requests, then exit — so no request is dropped mid-flight. Probes decide what gets traffic;
graceful shutdown decides how a pod leaves.
Scaling & Rollouts
Two everyday operations close out orchestration: handling more load (scaling) and shipping new versions without downtime (rollouts). Both lean on everything built so far — replicas, Services, readiness probes — and both are the Kubernetes expression of topics from the scaling chapter.
Horizontal scaling & autoscaling
Horizontal scaling means running more pod replicas (preferred over making one pod bigger — the horizontal-vs-vertical theme from the scaling chapter). You can set the count manually, or hand it to the Horizontal Pod Autoscaler (HPA), which watches a metric (typically CPU) and adjusts replicas to hold a target — scaling out under load and back in when it subsides. Because app pods are stateless (§9), any replica can serve any request, so adding pods just works.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: api }
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 70 } # keep avg CPU ~70%
# Requires resource requests (§17) — utilization is measured against the request.
Rolling updates & rollback
When you change the image tag and re-apply, a Deployment performs a rolling update by default: it brings up new-version pods a few at a time, waits for each to pass its readiness probe (§17) before sending it traffic, and only then retires old pods. At every moment enough healthy pods are serving, so users see no downtime. If new pods fail readiness, the rollout stalls instead of taking the app down — and one command rolls back to the last good version.
Zero-downtime deploys aren't one feature — they're the composition of replicas (§14), readiness probes (§17), graceful shutdown, and the Service abstraction (§15) keeping a stable address over a shifting pod set. Each piece earned its place; the rollout is where they pay off.
CI/CD Pipelines
Containers make the artifact reproducible; CI/CD makes the path from a code commit to that artifact running in production automatic. CI (Continuous Integration) is the build-and-verify half — on every push, build the image, run tests, scan it. CD (Continuous Delivery/Deployment) is the ship half — push the image to a registry and roll it out. The whole point is to remove manual, error-prone steps so deploys are frequent, small, and boring.
name: ci-cd
on:
push:
branches: [main]
jobs:
build-test-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests # CI gate — must pass before we ship (§testing chapter)
run: make test # go test ./... | pytest
- name: Build image
run: docker build -t $REGISTRY/myapp:${{ github.sha }} . # tag = commit SHA (§8)
- name: Scan image for vulnerabilities
run: trivy image --severity HIGH,CRITICAL --exit-code 1 $REGISTRY/myapp:${{ github.sha }}
- name: Push to registry
run: |
echo "$REGISTRY_TOKEN" | docker login $REGISTRY -u ci --password-stdin
docker push $REGISTRY/myapp:${{ github.sha }}
- name: Deploy to Kubernetes # update the Deployment's image → triggers a rolling update (§18)
run: kubectl set image deployment/api api=$REGISTRY/myapp:${{ github.sha }}
Tagging the image with the git SHA (and deploying that exact tag) gives a precise, immutable link from a running pod back to the source that built it — the reproducibility goal from §8, end to end. When something breaks in prod you know exactly which commit is running, and rollback (§18) is unambiguous.
Deployment Strategies & GitOps
The rolling update (§18) is the default, but riskier releases call for strategies that limit blast radius, and mature teams change how deploys are triggered altogether with GitOps.
| Strategy | How it works | Trade-off |
|---|---|---|
| Rolling | Replace pods gradually (the default) | Simple, zero-downtime; old + new run briefly together |
| Blue-green | Stand up the full new version (green) beside old (blue), then flip all traffic at once | Instant switch & instant rollback; needs double the capacity during the cutover |
| Canary | Send a small % of traffic to the new version, watch metrics, then ramp up | Safest for risky changes; more orchestration/observability needed |
GitOps — Git as the source of truth
The pipeline in §19 pushes changes to the cluster. GitOps inverts that: the
desired state of the cluster lives entirely in a Git repository (all those YAML manifests), and an in-cluster
agent (Argo CD, Flux) continuously pulls from Git and reconciles the cluster to match. This is the
reconciliation loop of §11 extended to deployments themselves: Git is desired state, the cluster is actual
state, the agent closes the gap. You deploy by merging a pull request; you roll back with git revert;
and the repo is a complete, audited history of every change to production.
Notice the same idea at three scales: a container keeps the artifact identical, a Deployment keeps the running pods matching a spec, and GitOps keeps the whole cluster matching Git. Declare desired state, let a loop reconcile reality — it's containers and Kubernetes all the way down.
Production Cheat-Sheet
The whole manual compressed to what you reach for under pressure.
| Concept | One-liner |
|---|---|
| Container | An isolated process — namespaces (what it sees) + cgroups (what it uses) + an image filesystem. |
| vs VM | Containers share one host kernel; VMs each ship a full OS. Lighter, faster, weaker isolation. |
| Image vs container | Image = immutable template (class); container = running instance (object). |
| Layers | Images are stacked read-only layers, shared & cached; the container's writable top is ephemeral. |
| Dockerfile | Recipe; each instruction = a layer. RUN at build time, CMD at run time. |
| Multi-stage | Build in a heavy stage, copy only the artifact into a tiny final image. |
| Cache order | Deps before code — a changed layer rebuilds everything below it. |
| Registry & tags | Push/pull images by name:tag; pin a version or SHA, never :latest. |
| Volumes | Data that must survive lives in a volume, not the container's writable layer. |
| Compose | Declarative multi-container stack on one host — local dev. |
| Orchestration | Scheduling, self-healing, scaling, discovery, rollouts across many nodes. |
| Declarative | Declare desired state; a control loop reconciles actual toward it. The whole model. |
| Pod | Smallest unit; one+ containers sharing network & storage. Mortal, gets a new IP when replaced. |
| Deployment | Keeps N replicas of a pod running; self-heals; does rolling updates. |
| Service | Stable IP/DNS load-balancing across matching pods — service discovery. |
| Ingress | One L7 entry point: host/path routing + TLS to many Services. |
| Config/Secrets | Inject at run time (ConfigMap/Secret); same image everywhere. Secrets aren't encrypted by default. |
| Probes | Liveness → restart; readiness → in/out of rotation; startup → guard slow boots. |
| Requests/limits | Request = reserved (scheduling); limit = cgroup ceiling. Over memory → OOMKilled. |
| Scaling | Run more replicas; HPA autoscales on a metric (needs requests set). |
| Rollout/rollback | Gradual replace gated by readiness; kubectl rollout undo reverts. |
| CI/CD | Commit → build → test → scan → push → deploy, automatically; tag by SHA. |
| Strategies | Rolling (default), blue-green (instant flip), canary (trickle & watch). |
| GitOps | Git is desired state; an agent pulls & reconciles. Deploy by PR, roll back by revert. |
The whole topic in one breath: a container packages your app with its entire environment so "what runs in prod" equals "what you tested" (§1) — achieved with Linux namespaces and cgroups, not a heavy VM (§2–3). You build an image as cached, shareable layers (§4) from a Dockerfile (§5), shrink and harden it with multi-stage builds and good cache ordering (§6–7), then push a SHA-tagged image to a registry (§8). Locally you wire several containers with networks, volumes, and Compose (§9–10). At fleet scale, Kubernetes reconciles declared desired state onto nodes (§11–12): Deployments keep replicas of pods alive and self-healing (§13–14), Services and Ingress give stable addressing and routing (§15), ConfigMaps/Secrets inject per-environment config (§16), probes and resource requests/limits drive health and scheduling (§17), and HPA plus rolling updates deliver scaling and zero-downtime releases (§18). CI/CD automates commit→build→test→ scan→push→deploy (§19), and blue-green/canary/GitOps make releases safe and auditable (§20). One idea recurs at every layer: declare the desired state and let a loop keep reality matching it.
Grounded in the Docker & Kubernetes docs · OCI image-spec · the Twelve-Factor App · Go 1.22+ / Python 3.11+ application examples.