A detailed backend reference

gRPC, calling a remote
function like a local one.

A first-principles walkthrough of the RPC framework that powers the internals of Google, Netflix, Uber and most modern microservice fleets — from the contract (Protocol Buffers) and the binary wire format, through the HTTP/2 transport and the four streaming shapes, to deadlines, interceptors, mTLS and production resilience. Written to explain not just what each piece does but why it exists and how it works underneath. Implementations in both Go and Python.

RPC over HTTP/2 Protocol Buffers Go 1.22+ · Python 3.11+ 21 sections

Part I · Foundations

What gRPC Actually Is

gRPC is a framework for calling a function that lives on another machine as if it were a local function in your own process. The name expands to gRPC Remote Procedure Call (a recursive acronym; the "g" has been backronymed to a different word in every release for fun). The phrase that matters is the middle one — Remote Procedure Call. You write user, err := client.GetUser(ctx, &pb.GetUserRequest{Id: 42}) and it looks like an ordinary method call, but under the hood the arguments are serialized, shipped across the network to a server, executed there, and the return value is shipped back — all hidden behind that one line.

Three things are true about gRPC and you need all three in your head at once. It is (1) a contract-first RPC framework: you describe your API in a .proto file and code is generated from it; (2) it serializes data with Protocol Buffers, a compact binary format, instead of text like JSON; and (3) it transports those bytes over HTTP/2, which gives it multiplexing and real bidirectional streaming. Everything else in this manual is a consequence of those three facts.

REST hands you a building with addressable rooms (resources) and a fixed set of things you may do to each room (GET, POST, …). gRPC hands you a remote control with labelled buttons (methods) — you don't think about rooms, you press GetUser or SendPayment and the work happens elsewhere. One models nouns; the other models verbs.

Where gRPC sits

gRPC is three layers stacked: your generated API, Protobuf, and HTTP/2

The top layer is the only part you hand-write. gRPC generates the glue that turns your method calls into Protobuf bytes flowing over HTTP/2 streams.

"Procedure call" is the whole idea

In ordinary code, calling a function is invisible plumbing: you pass arguments, the CPU jumps to the function, it returns a value. RPC asks a simple question — what if that function lived on a different computer? The dream of RPC (which dates back to the 1980s) is to make the network disappear, so a distributed system feels like one program. gRPC is the modern, production-grade realization of that dream: you never write socket code, never parse a response by hand, never build a URL. You call a typed method and handle a typed result or a typed error.

The one-sentence definition

gRPC = typed remote method calls, defined by a Protobuf contract, serialized as compact binary, carried over HTTP/2 streams — so a client in any language can call a server in any other language as if it were local.

Why gRPC Exists

gRPC isn't a replacement for REST everywhere — it was built to solve specific pains that show up when many services (often in different languages) talk to each other constantly, at high volume, inside a system. Google built it (open-sourced in 2015, evolved from an internal system called Stubby) precisely because REST-over-JSON was too slow, too loose, and too limited for service-to-service traffic at their scale. Five concrete problems drove its design.

The problem with REST/JSON	What gRPC does instead
JSON is bulky & slow. Text, with repeated field names on every object; parsing is CPU-heavy.	Protobuf binary: field numbers not names, varint-packed. Often 3–10× smaller and far faster to (de)serialize.
No enforced contract. The shape of a JSON payload lives in docs (or someone's head); drift causes runtime breakage.	The `.proto` file is the contract. Client & server generate code from the same source — mismatches fail at compile time.
Weak streaming. Plain HTTP/1.1 REST is request→response; streaming needs bolt-ons (SSE, polling, raw WebSockets).	Four call types including full bidirectional streaming, native to the framework over HTTP/2.
Polyglot friction. Every language hand-writes its own client & serialization, inconsistently.	One `.proto` generates idiomatic clients/servers for Go, Python, Java, C++, Rust, and more.
HTTP/1.1 connection overhead. One request per connection (or head-of-line blocking), repeated handshakes.	HTTP/2 multiplexes many concurrent calls over one long-lived connection.

The honest scope

Those strengths are aimed at internal, machine-to-machine communication — microservices, backend-to-backend, mobile-to-backend where you control both ends. gRPC is weaker than REST for public, browser-facing APIs (browsers can't speak raw gRPC — see §19) and for human-debuggable, cache-friendly, broadly-compatible endpoints. The full comparison is §18; for now, hold the frame: gRPC is the internal nervous system; REST is the public front door.

Where you'll meet it

Service meshes (Istio, Linkerd), Kubernetes' own API machinery, etcd, CockroachDB, Envoy's xDS, and the internal call graphs of most large fintech/consumer platforms. If you've sent a payment or streamed a video, gRPC almost certainly carried some hop of that request between backend services.

The RPC Mental Model

Before any syntax, internalize the machinery that makes a remote call look local, because every gRPC concept later is just a named part of this picture. A local function call and a remote one differ in exactly one place: between "call" and "execute," the arguments have to cross a network. RPC inserts two pieces of generated code — a stub on the client and a skeleton/handler on the server — to hide that crossing.

The anatomy of a remote call

Stub and skeleton hide the network between "call" and "execute"

"Marshalling" = turning a language object into bytes; "unmarshalling" = the reverse. The stub marshals the request and unmarshals the reply; the skeleton does the mirror image. You write neither.

The leaky-abstraction warning

RPC tries to make the network invisible — but the network is never truly invisible, and pretending otherwise is the classic RPC trap. A local call can't time out, get lost, or be reordered; a remote one can do all three. That's why gRPC gives first-class tools for the things a local call never needed: deadlines (§13), status codes for partial failure (§14), retries (§17), and cancellation. Treat every remote call as "a local call that can fail in network-shaped ways," and you'll design resilient systems instead of brittle ones.

Don't pretend the wire isn't there

The single biggest mistake with RPC is writing remote calls as if they were free and infallible — no timeout, no error handling for UNAVAILABLE, no thought about latency in a loop. The abstraction is a convenience, not a guarantee. Always pass a context/deadline and always handle the error.

Part II · The Two Pillars

Protocol Buffers — the Contract

Protocol Buffers (“protobuf”) is two things wearing one name: an IDL (Interface Definition Language) for describing your messages and services, and a binary serialization format for encoding them on the wire. This section is the IDL half — the .proto file that is the single source of truth both sides generate code from. Get the contract right and the client and server literally cannot disagree about the shape of the data.

A .proto file defines messages (the data structures) and services (collections of RPC methods). Here is a complete, realistic example — a user service — annotated with every rule that matters:

user.proto

syntax = "proto3";              // always declare the syntax; proto3 is current

package user.v1;                // namespace + a versioning convention (v1, v2...)

option go_package = "example.com/gen/userv1;userv1";  // where generated Go lands

// A message is a typed record. Each FIELD has a type, a name, and a NUMBER.
message User {
  string id          = 1;       // the "= 1" is the FIELD NUMBER, not a value
  string email       = 2;
  string full_name   = 3;
  Role   role        = 4;       // a nested enum (declared below)
  repeated string tags = 5;     // "repeated" = a list/array of strings
  int64  created_at  = 6;       // unix seconds; proto3 has no native date
}

// An enum is a fixed set of named values. The first MUST be 0 (the default).
enum Role {
  ROLE_UNSPECIFIED = 0;         // 0 is the implicit default — reserve it
  ROLE_MEMBER      = 1;
  ROLE_ADMIN       = 2;
}

message GetUserRequest  { string id = 1; }
message GetUserResponse { User user = 1; }   // messages nest inside messages

message CreateUserRequest {
  string email     = 1;
  string full_name = 2;
  optional string phone = 3;    // "optional" tracks presence: set vs unset vs ""
}

// A SERVICE is a set of methods. Each takes one message and returns one message.
service UserService {
  rpc GetUser    (GetUserRequest)    returns (GetUserResponse);
  rpc CreateUser (CreateUserRequest) returns (User);
}

The rules that actually bite

Field numbers are the real identity, not names. On the wire, email is transmitted as field 2, never as the string "email" (this is why protobuf is compact — §05). The name is for your code; the number is the contract.
Never change or reuse a field number. This is the cardinal rule of schema evolution. Renaming a field is safe (names aren't on the wire); changing its number or type silently corrupts data for any peer still using the old definition. To remove a field, mark it reserved so the number can never be accidentally reused.
Adding fields is backward-compatible. Give a new field a fresh number; old clients simply don't see it, new servers treat it as unset when an old client omits it. This is how a gRPC API evolves without breaking deployed consumers — the same additive-change discipline you know from REST versioning.
proto3 has defaults, not nulls. An unset string is "", an unset int is 0, an unset bool is false. If you must distinguish "absent" from "zero," use optional (which adds presence tracking) — the same trap as Go's zero values, solved the same way (§13 of the REST/handlers manuals echoes this).

Construct	Means	Notes
scalar	`int32 int64 uint32 sint32 fixed64 float double bool string bytes`	Pick `sint*` for often-negative numbers; `bytes` for raw binary.
repeated	An ordered list of the field's type	The protobuf equivalent of an array/slice.
enum	A fixed value set; first entry must be `0`	Integrity + self-documentation, like a Postgres enum.
oneof	At most one of several fields is set	A tagged union — e.g. a result that is either a value or an error.
map<k,v>	An associative array	Sugar over a repeated key/value message.
optional	Adds explicit presence tracking to a scalar	Distinguishes “unset” from the zero value.

The contract is the API

In REST the contract is prose in a Swagger doc that code may or may not match. In gRPC the .proto is executable truth: both sides generate from it, so a field you added or a type you changed is reflected in both clients and servers the moment they regenerate. The schema can't silently drift from the implementation.

How Protobuf Encodes — the Wire Format

The reason gRPC is fast and small comes down to how protobuf turns a message into bytes. You don't hand-write this, but understanding it explains every performance claim and every gotcha. The core trick: each field is written as a tiny tag (which encodes the field number and a wire type) followed by the value — and integers are packed using varints that use fewer bytes for smaller numbers. No field names, no quotes, no commas, no whitespace.

JSON vs Protobuf, same data

Why the binary form is dramatically smaller

Multiply this saving across millions of messages per second and you see why high-traffic internal systems reach for protobuf over JSON.

The tag: field number + wire type in one byte (usually)

Each field on the wire begins with a tag computed as (field_number << 3) | wire_type. The low 3 bits are the wire type (how to read the bytes that follow: varint, 64-bit, length-delimited, 32-bit); the rest is the field number. So the decoder reads the tag, learns "this is field 2, and it's length-delimited," and knows exactly how to consume what comes next — even if it has never seen that field before (it can skip unknown fields, which is what makes forward-compatibility work).

Decoding one field

How a tag byte tells the reader what to do next

Because the reader can identify and skip a field it doesn't recognize, a new server can add fields without breaking old clients — the wire format is self-describing enough to step over the unknown.

The cost of binary: it isn't human-readable

You can't curl a gRPC endpoint and eyeball the JSON. Debugging needs tools that understand the schema (grpcurl, reflection — §20). That opacity is the price of the speed and size; it's the main reason public/debuggable APIs often stay on REST.

HTTP/2 — the Transport Underneath

gRPC doesn't invent its own transport — it rides on HTTP/2, and almost every gRPC superpower (concurrency, all four streaming shapes, low overhead) is really an HTTP/2 feature. The thing to understand is HTTP/2's central idea: a single TCP connection is divided into many independent, interleaved streams, each carrying a sequence of binary frames. That's what lets one connection carry hundreds of concurrent RPCs — and lets data flow in both directions at once.

Multiplexing

One connection, many concurrent RPCs interleaved as frames

A gRPC call maps to one HTTP/2 stream. Many calls → many streams → one connection. This is why a gRPC client keeps a long-lived connection and reuses it.

How a gRPC call maps onto HTTP/2

Concretely, an RPC is an HTTP/2 POST to a path shaped like /package.Service/Method (e.g. /user.v1.UserService/GetUser). gRPC metadata travels as HTTP/2 headers; the serialized protobuf travels in DATA frames as the body; and the final status (the gRPC status code — §14) arrives in HTTP/2 trailers after the body. Streaming simply means more than one message flows in one or both directions on that stream before it closes.

gRPC concept	HTTP/2 mechanism
One RPC call	One HTTP/2 stream (request + response)
Method being called	`:path` header = `/pkg.Service/Method`
Metadata (auth tokens, trace IDs)	HTTP/2 request & response headers
The request/response message(s)	length-prefixed protobuf in DATA frames
Final status + message	`grpc-status` / `grpc-message` trailers
Streaming	multiple DATA frames before the stream half-closes

Why this matters for you

Because gRPC needs HTTP/2 end to end, any proxy/load balancer in the path must speak HTTP/2 and do L7 (request-aware) balancing — a naive L4 TCP balancer will pin every call to one backend, since they all share one connection (§17). And because browsers can't expose raw HTTP/2 framing to JS, browsers can't speak native gRPC at all (§19). The transport choice shapes the whole deployment story.

Part III · The Four Call Types

Unary RPC

The simplest and most common shape: one request, one response — exactly like a normal function call. The client sends a single message, the server does its work and returns a single message. ~90% of real-world RPCs are unary. We'll build the GetUser method from the user.proto in §04, in full, both languages.

Unary

One message each way

// ===== SERVER =====
package main

import (
    "context"
    "log"
    "net"

    "google.golang.org/grpc"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
    userv1 "example.com/gen/userv1" // generated from user.proto
)

// Embed the generated UnimplementedUserServiceServer for forward-compat.
type server struct {
    userv1.UnimplementedUserServiceServer
}

// The method signature is generated FROM the proto: ctx, *Request -> *Response, error
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
    if req.GetId() == "" {
        return nil, status.Error(codes.InvalidArgument, "id is required") // typed error, §14
    }
    // ...real work: query the DB by req.GetId()...
    u := &userv1.User{Id: req.GetId(), Email: "ada@example.com", FullName: "Ada", Role: userv1.Role_ROLE_ADMIN}
    return &userv1.GetUserResponse{User: u}, nil
}

func main() {
    lis, _ := net.Listen("tcp", ":50051")
    s := grpc.NewServer()
    userv1.RegisterUserServiceServer(s, &server{}) // wire the impl to the service
    log.Println("gRPC on :50051")
    s.Serve(lis)
}

// ===== CLIENT =====
func callGetUser() {
    // NewClient replaces the deprecated grpc.Dial; insecure creds for local dev only
    conn, _ := grpc.NewClient("localhost:50051", grpc.WithTransportCredentials(insecure.NewCredentials()))
    defer conn.Close()

    client := userv1.NewUserServiceClient(conn) // the generated STUB
    ctx, cancel := context.WithTimeout(context.Background(), time.Second) // always a deadline, §13
    defer cancel()

    resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "u42"}) // looks local, runs remote
    if err != nil {
        log.Fatalf("GetUser failed: %v", err) // err carries the gRPC status code
    }
    log.Printf("got user: %s", resp.GetUser().GetFullName())
}

# ===== SERVER =====
from concurrent import futures
import grpc
import user_pb2 as pb          # generated: messages
import user_pb2_grpc as pb_grpc  # generated: service base classes

class UserService(pb_grpc.UserServiceServicer):
    # Method signature generated FROM the proto: (self, request, context) -> response
    def GetUser(self, request, context):
        if not request.id:
            context.abort(grpc.StatusCode.INVALID_ARGUMENT, "id is required")  # typed error, §14
        # ...real work: query the DB by request.id...
        user = pb.User(id=request.id, email="ada@example.com",
                       full_name="Ada", role=pb.ROLE_ADMIN)
        return pb.GetUserResponse(user=user)

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    pb_grpc.add_UserServiceServicer_to_server(UserService(), server)  # wire impl to service
    server.add_insecure_port("[::]:50051")  # insecure = local dev only
    server.start()
    print("gRPC on :50051")
    server.wait_for_termination()

# ===== CLIENT =====
def call_get_user():
    with grpc.insecure_channel("localhost:50051") as channel:
        stub = pb_grpc.UserServiceStub(channel)        # the generated STUB
        try:
            # timeout= is the deadline, §13; looks local, runs remote
            resp = stub.GetUser(pb.GetUserRequest(id="u42"), timeout=1.0)
            print("got user:", resp.user.full_name)
        except grpc.RpcError as e:
            print("failed:", e.code(), e.details())     # carries the gRPC status code

if __name__ == "__main__":
    serve()

Notice the symmetry across languages: a generated stub on the client, a generated servicer/server base class on the server, and a method whose exact signature came from the proto.

Server Streaming

One request, a stream of responses. The client asks once; the server sends back many messages over time, then closes the stream. Perfect for: returning a large result set in chunks, a live feed of events, progress updates on a long job, or paginating without repeated round trips. The proto marks the response as stream.

Server streaming

Ask once, receive many

// proto:  rpc ListUsers(ListUsersRequest) returns (stream User);

// ===== SERVER: receive one req, call stream.Send(...) repeatedly =====
func (s *server) ListUsers(req *userv1.ListUsersRequest, stream userv1.UserService_ListUsersServer) error {
    for _, u := range queryUsers(req.GetFilter()) { // imagine this yields a big result set
        if err := stream.Send(u); err != nil {       // push one message down the stream
            return err                                // client gone / cancelled
        }
    }
    return nil // returning nil closes the stream cleanly (sends OK trailer)
}

// ===== CLIENT: call once, then Recv() in a loop until io.EOF =====
func listUsers(client userv1.UserServiceClient) {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    stream, _ := client.ListUsers(ctx, &userv1.ListUsersRequest{Filter: "active"})
    for {
        u, err := stream.Recv()
        if err == io.EOF { break } // server closed the stream — we're done
        if err != nil { log.Fatal(err) }
        log.Printf("user: %s", u.GetFullName())
    }
}

# proto:  rpc ListUsers(ListUsersRequest) returns (stream User);

# ===== SERVER: a generator — every `yield` sends one message =====
class UserService(pb_grpc.UserServiceServicer):
    def ListUsers(self, request, context):
        for u in query_users(request.filter):  # big result set
            yield u                             # yielding pushes one message down the stream
        # function returning ends the stream cleanly

# ===== CLIENT: the call returns an ITERABLE of responses =====
def list_users(stub):
    responses = stub.ListUsers(pb.ListUsersRequest(filter="active"), timeout=10.0)
    for u in responses:        # iterate until the server closes the stream
        print("user:", u.full_name)

Go uses an explicit stream.Send/stream.Recv pair; Python expresses the server side as a generator (yield) and the client side as a plain iterable — idiomatic to each language, same wire behavior.

Client Streaming

A stream of requests, one response. The client sends many messages, then the server replies once with a summary/result. Ideal for uploads, batch ingestion, or aggregating a series of readings into a single computed answer. The proto marks the request as stream.

Client streaming

Send many, get one back

// proto:  rpc UploadEvents(stream Event) returns (UploadSummary);

// ===== SERVER: Recv() in a loop, then SendAndClose() once at the end =====
func (s *server) UploadEvents(stream userv1.UserService_UploadEventsServer) error {
    count := 0
    for {
        ev, err := stream.Recv()
        if err == io.EOF { // client finished sending — now reply once
            return stream.SendAndClose(&userv1.UploadSummary{Received: int32(count)})
        }
        if err != nil { return err }
        store(ev)
        count++
    }
}

// ===== CLIENT: Send() many, then CloseAndRecv() for the single reply =====
func uploadEvents(client userv1.UserServiceClient, events []*userv1.Event) {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    stream, _ := client.UploadEvents(ctx)
    for _, ev := range events {
        stream.Send(ev) // push each one up
    }
    summary, err := stream.CloseAndRecv() // close our side, await the summary
    if err != nil { log.Fatal(err) }
    log.Printf("server received %d events", summary.GetReceived())
}

# proto:  rpc UploadEvents(stream Event) returns (UploadSummary);

# ===== SERVER: request_iterator yields incoming messages; return once =====
class UserService(pb_grpc.UserServiceServicer):
    def UploadEvents(self, request_iterator, context):
        count = 0
        for ev in request_iterator:   # iterate the client's stream
            store(ev)
            count += 1
        return pb.UploadSummary(received=count)  # single reply after client closes

# ===== CLIENT: pass an iterable/generator of requests; get one response =====
def upload_events(stub, events):
    def gen():
        for ev in events:
            yield ev
    summary = stub.UploadEvents(gen(), timeout=30.0)  # returns the single summary
    print("server received", summary.received, "events")

The asymmetry is the point: SendAndClose/CloseAndRecv in Go, a request_iterator plus a normal return in Python — the server consumes the whole request stream before producing its one answer.

Bidirectional Streaming

A stream of requests and a stream of responses, simultaneously and independently. Both sides read and write on the same stream at the same time, in any order — this is what HTTP/2's full-duplex framing (§06) buys you. It's the shape behind chat, real-time collaboration, live telemetry with control messages, and interactive sessions. Both request and response are stream in the proto.

Bidirectional streaming

Both sides send and receive at once

// proto:  rpc Chat(stream ChatMessage) returns (stream ChatMessage);

// ===== SERVER: loop Recv() and Send() on the same stream =====
func (s *server) Chat(stream userv1.UserService_ChatServer) error {
    for {
        msg, err := stream.Recv()
        if err == io.EOF { return nil } // client closed its send side
        if err != nil { return err }
        // echo back (or broadcast to a room, etc.) — can Send anytime, any number
        reply := &userv1.ChatMessage{User: "server", Text: "ack: " + msg.GetText()}
        if err := stream.Send(reply); err != nil { return err }
    }
}

// ===== CLIENT: typically Send in one goroutine, Recv in another =====
func chat(client userv1.UserServiceClient) {
    stream, _ := client.Chat(context.Background())
    go func() { // concurrent receiver
        for {
            in, err := stream.Recv()
            if err != nil { return }
            log.Printf("<< %s", in.GetText())
        }
    }()
    for _, t := range []string{"hi", "how are you", "bye"} {
        stream.Send(&userv1.ChatMessage{User: "ada", Text: t}) // send concurrently
    }
    stream.CloseSend() // signal we're done sending; receiver drains the rest
}

# proto:  rpc Chat(stream ChatMessage) returns (stream ChatMessage);

# ===== SERVER: iterate incoming, yield outgoing — interleaved =====
class UserService(pb_grpc.UserServiceServicer):
    def Chat(self, request_iterator, context):
        for msg in request_iterator:          # read the client's stream
            yield pb.ChatMessage(user="server", text="ack: " + msg.text)  # write back

# ===== CLIENT: pass a request generator; iterate the response stream =====
def chat(stub):
    def outgoing():
        for t in ["hi", "how are you", "bye"]:
            yield pb.ChatMessage(user="ada", text=t)
    responses = stub.Chat(outgoing())   # both directions live at once
    for reply in responses:
        print("<<", reply.text)

# (for true concurrency under load, prefer the async API: grpc.aio)

Streaming gotchas to respect

Streams are long-lived, so they consume a connection/goroutine for their lifetime — always set deadlines or idle limits, and handle the client vanishing mid-stream. There's no automatic back-pressure knob beyond HTTP/2 flow control, so a fast producer can overwhelm a slow consumer if you don't pace it. And a single stream is ordered but not a transaction — if it breaks halfway, you've delivered a prefix, so design messages to be resumable or idempotent.

The four shapes at a glance

Pick by how many messages flow each way

Part IV · The Machinery

The .proto → Code Workflow

You never write the stubs and skeletons by hand — a compiler, protoc (or the buf toolchain), reads your .proto and emits idiomatic source for each target language via a language-specific plugin. This is the step that turns the contract into callable code, and it's the reason a Go client and a Python server can interoperate flawlessly: both were generated from the same file.

Code generation

One contract, many generated clients and servers

// Install the two plugins once (they sit on your PATH):
//   go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
//   go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

// Generate messages (--go_out) AND the service stubs/skeleton (--go-grpc_out):
//   protoc \
//     --go_out=. --go_opt=paths=source_relative \
//     --go-grpc_out=. --go-grpc_opt=paths=source_relative \
//     user.proto

// Produces:
//   user.pb.go        — structs for each message (+ getters)
//   user_grpc.pb.go   — UserServiceClient (stub) + UserServiceServer (to implement)

// Then in code you simply import the generated package:
//   import userv1 "example.com/gen/userv1"

# Install the tooling once:
#   pip install grpcio grpcio-tools

# Generate messages AND service stubs in one command:
#   python -m grpc_tools.protoc \
#       -I. \
#       --python_out=. \
#       --grpc_python_out=. \
#       user.proto

# Produces:
#   user_pb2.py       — message classes
#   user_pb2_grpc.py  — UserServiceStub (client) + UserServiceServicer (to implement)

# Then import both in your code:
#   import user_pb2 as pb
#   import user_pb2_grpc as pb_grpc

Use buf in real projects

Raw protoc invocations get unwieldy fast. The buf toolchain wraps it with a config file, dependency management, breaking-change detection (it fails CI if you'd violate the never-reuse-a-field-number rule), and a linter. For anything beyond a toy, prefer buf — it turns the schema-evolution discipline from §04 into an automated gate.

Channels, Stubs & a Call End-to-End

Two client-side objects matter, and people conflate them. A channel (Go calls it a ClientConn) is the long-lived, reusable connection to a server — it manages the underlying HTTP/2 connection(s), reconnection, and load-balancing state. A stub is the cheap, generated object you create on top of a channel to actually call methods. The rule: create the channel once and share it; create stubs freely. Opening a channel per request destroys performance — you throw away the connection reuse that was the whole point of HTTP/2.

A unary call, end to end

Every step from your method call to the typed reply

The performance rule

One channel per server, shared across your whole app, for its whole lifetime. Stubs are throwaway. If your latency is mysteriously bad, the first thing to check is whether you're creating a channel (and thus a fresh HTTP/2 + TLS handshake) on every call.

Metadata, Deadlines & Cancellation

These are the tools that acknowledge the network is real (the §03 warning, made concrete). Metadata is gRPC's key–value side-channel — the equivalent of HTTP headers — for things that aren't the message itself: auth tokens, trace/request IDs, API versions. Deadlines put an absolute time bound on a call and, crucially, propagate across hops. Cancellation lets a caller (or a broken connection) abort in-flight work so servers don't toil on results nobody wants.

Deadlines beat timeouts — and they propagate

A gRPC deadline is an absolute point in time, not a per-hop duration. When service A calls B with a 1-second deadline and B calls C, the remaining budget travels along — so C knows it has, say, 600ms left, not a fresh second. This prevents the classic cascade where each layer waits its own full timeout and total latency balloons. Always set a deadline on every call. A call with no deadline can hang forever, pinning resources.

// ===== CLIENT: attach a deadline + metadata =====
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) // absolute deadline
defer cancel() // cancel frees resources whether we time out or finish early

ctx = metadata.AppendToOutgoingContext(ctx,
    "authorization", "Bearer "+token,   // auth travels as metadata, §16
    "x-request-id", reqID)              // trace id for correlation, like §15 of the layers manual

resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "u42"})
if status.Code(err) == codes.DeadlineExceeded {
    log.Println("call timed out") // a network-shaped failure a local call never had
}

// ===== SERVER: read metadata, respect the inherited deadline =====
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
    md, _ := metadata.FromIncomingContext(ctx)
    auth := md.Get("authorization") // verify token here (or in an interceptor, §15)

    // ctx already carries the client's remaining deadline + cancellation —
    // pass it straight to the DB driver so slow work is abandoned automatically.
    if ctx.Err() != nil { return nil, status.FromContextError(ctx.Err()).Err() }
    _ = auth
    return s.lookup(ctx, req.GetId())
}

# ===== CLIENT: attach a deadline (timeout=) + metadata =====
metadata = (
    ("authorization", f"Bearer {token}"),  # auth as metadata, §16
    ("x-request-id", req_id),               # trace id for correlation
)
try:
    resp = stub.GetUser(pb.GetUserRequest(id="u42"),
                        timeout=1.0,         # the deadline, in seconds
                        metadata=metadata)
except grpc.RpcError as e:
    if e.code() == grpc.StatusCode.DEADLINE_EXCEEDED:
        print("call timed out")             # a network-shaped failure

# ===== SERVER: read metadata, check for cancellation =====
class UserService(pb_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        md = dict(context.invocation_metadata())
        auth = md.get("authorization")       # verify token (or in an interceptor, §15)

        if not context.is_active():          # client gone / deadline passed?
            return pb.GetUserResponse()      # abandon the work
        return self.lookup(request.id)

Pass the context down, always

In Go, thread the incoming ctx into every downstream call (DB queries, outbound RPCs). That's what makes deadlines and cancellation actually work — if the client hangs up, the cancellation ripples all the way down and frees everything. A handler that ignores ctx keeps grinding on abandoned work.

Status Codes & Error Handling

gRPC does not use HTTP status codes. It has its own fixed set of status codes — an enum of ~16 values — sent in the grpc-status trailer (§06). Every call ends with exactly one: OK for success, or one of the error codes with an optional message. Returning the right code is part of your contract, because clients (and retry policies — §17) switch on it.

Code	Means	Closest HTTP analogue
OK	Success	200
INVALID_ARGUMENT	Client sent bad input (independent of system state)	400
UNAUTHENTICATED	No / invalid credentials	401
PERMISSION_DENIED	Authenticated but not allowed	403
NOT_FOUND	The requested entity doesn't exist	404
ALREADY_EXISTS	Create conflicts with an existing entity	409
FAILED_PRECONDITION	System not in a state for the operation	400/409
RESOURCE_EXHAUSTED	Quota / rate limit hit	429
DEADLINE_EXCEEDED	Call ran past its deadline	504
UNAVAILABLE	Transient — server down/overloaded; safe to retry	503
INTERNAL	A real bug / invariant broken	500
UNIMPLEMENTED	Method not implemented on this server	501

import (
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

// SERVER: return a typed status, not a bare error string.
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
    if req.GetId() == "" {
        return nil, status.Error(codes.InvalidArgument, "id is required")
    }
    u, found := s.db.Find(req.GetId())
    if !found {
        return nil, status.Errorf(codes.NotFound, "no user with id %q", req.GetId())
    }
    return &userv1.GetUserResponse{User: u}, nil
}

// CLIENT: inspect the code to decide what to do.
resp, err := client.GetUser(ctx, req)
if err != nil {
    st := status.Convert(err)        // pull the status out of the error
    switch st.Code() {
    case codes.NotFound:        // expected — show "not found" in UI
    case codes.Unavailable:     // transient — retry with backoff, §17
    default:                    log.Printf("unexpected: %v: %s", st.Code(), st.Message())
    }
}

import grpc

# SERVER: abort with a code + message (or set_code/set_details then return).
class UserService(pb_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        if not request.id:
            context.abort(grpc.StatusCode.INVALID_ARGUMENT, "id is required")
        user = self.db.find(request.id)
        if user is None:
            context.abort(grpc.StatusCode.NOT_FOUND, f"no user with id {request.id!r}")
        return pb.GetUserResponse(user=user)

# CLIENT: catch RpcError and switch on .code()
try:
    resp = stub.GetUser(req, timeout=1.0)
except grpc.RpcError as e:
    if e.code() == grpc.StatusCode.NOT_FOUND:
        ...   # expected — show "not found"
    elif e.code() == grpc.StatusCode.UNAVAILABLE:
        ...   # transient — retry with backoff, §17
    else:
        print("unexpected:", e.code(), e.details())

Rich, structured errors

When a code + message isn't enough (e.g. per-field validation details, like the REST error envelope), gRPC supports error details — typed protobuf messages attached to the status (the google.rpc types: BadRequest, QuotaFailure, RetryInfo…). The client deserializes them as structured data, never by parsing a human string — same principle as the machine-readable code field in the REST manual's error envelope.

Interceptors — the Middleware of gRPC

If you read the handlers/services/middleware chapter, this is the exact same idea with a different name. An interceptor is a function that wraps every RPC, running before and/or after your handler — the place to centralize cross-cutting concerns so you don't repeat them in every method: authentication, logging, metrics, tracing, panic recovery, rate limiting. They come in two flavours: unary interceptors (wrap one-shot calls) and stream interceptors (wrap streaming calls), on both the client and the server side.

The interceptor chain

Cross-cutting logic wraps the handler, just like HTTP middleware

// A unary server interceptor: signature is fixed by the framework.
func authInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler) (any, error) {

    md, _ := metadata.FromIncomingContext(ctx)
    tokens := md.Get("authorization")
    if len(tokens) == 0 || !valid(tokens[0]) {
        return nil, status.Error(codes.Unauthenticated, "missing or invalid token")
    }
    // attach the verified identity for the handler to read from ctx
    ctx = context.WithValue(ctx, userKey{}, parse(tokens[0]))
    return handler(ctx, req) // call the next link / the real handler
}

func loggingInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler) (any, error) {
    start := time.Now()
    resp, err := handler(ctx, req)
    log.Printf("%s took %s -> %s", info.FullMethod, time.Since(start), status.Code(err))
    return resp, err
}

// Register the chain when building the server (outermost listed first):
s := grpc.NewServer(
    grpc.ChainUnaryInterceptor(loggingInterceptor, authInterceptor),
)

import grpc

class AuthInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        md = dict(handler_call_details.invocation_metadata)
        token = md.get("authorization", "")
        if not valid(token):
            # short-circuit: abort before the handler ever runs
            def deny(request, context):
                context.abort(grpc.StatusCode.UNAUTHENTICATED, "missing or invalid token")
            return grpc.unary_unary_rpc_method_handler(deny)
        return continuation(handler_call_details)  # proceed to next / handler

# Register interceptors when building the server (applied in order):
server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[AuthInterceptor()],
)

# Client-side interceptors also exist (e.g. to inject auth on every call):
#   channel = grpc.intercept_channel(base_channel, MyClientInterceptor())

This is the gRPC home for everything the layers manual put in middleware: auth, logging, tracing, rate limiting, panic recovery — written once, applied to every method.

Part V · Production & Interop

Authentication & Security

gRPC security splits into two orthogonal questions: channel security (is the connection encrypted and is the peer who they claim to be? → TLS / mTLS) and call credentials (who is the caller for this request? → a token in metadata). You combine them: TLS protects the pipe, a per-call token identifies the user. The insecure credentials used in earlier examples are for local development only — never ship them.

Layer	Mechanism	Answers
Transport	TLS — server presents a cert	encrypted? + is the server genuine?
Transport	mTLS — both sides present certs	+ is the client service genuine? (service-to-service identity)
Per-call	token in metadata (JWT / OAuth bearer)	which user is making this call? (authn/authz, verified in an interceptor §15)

import "google.golang.org/grpc/credentials"

// ===== SERVER over TLS =====
creds, _ := credentials.NewServerTLSFromFile("server.crt", "server.key")
s := grpc.NewServer(grpc.Creds(creds)) // every connection is now encrypted

// ===== CLIENT over TLS =====
tlsCreds := credentials.NewTLS(&tls.Config{RootCAs: pool}) // trust this CA
conn, _ := grpc.NewClient("api.example.com:443", grpc.WithTransportCredentials(tlsCreds))

// ===== Per-call token (combine with TLS) =====
// Implement credentials.PerRPCCredentials so a fresh token rides on every call:
type tokenCreds struct{ token string }
func (t tokenCreds) GetRequestMetadata(ctx context.Context, _ ...string) (map[string]string, error) {
    return map[string]string{"authorization": "Bearer " + t.token}, nil
}
func (t tokenCreds) RequireTransportSecurity() bool { return true } // refuse to send token in cleartext

conn, _ = grpc.NewClient("api.example.com:443",
    grpc.WithTransportCredentials(tlsCreds),
    grpc.WithPerRPCCredentials(tokenCreds{token}),
)

# ===== SERVER over TLS =====
with open("server.key", "rb") as k, open("server.crt", "rb") as c:
    creds = grpc.ssl_server_credentials([(k.read(), c.read())])
server.add_secure_port("[::]:443", creds)   # encrypted port

# ===== CLIENT over TLS =====
with open("ca.crt", "rb") as f:
    channel_creds = grpc.ssl_channel_credentials(root_certificates=f.read())

# ===== Per-call token, composed with the channel credentials =====
class TokenAuth(grpc.AuthMetadataPlugin):
    def __init__(self, token): self.token = token
    def __call__(self, context, callback):
        callback((("authorization", f"Bearer {self.token}"),), None)  # adds metadata per call

call_creds = grpc.metadata_call_credentials(TokenAuth(token))
composite  = grpc.composite_channel_credentials(channel_creds, call_creds)
channel    = grpc.secure_channel("api.example.com:443", composite)

Never send tokens over plaintext

Bearer tokens are like passwords — whoever holds one can impersonate the caller (echoing the JWT-theft warning from the auth manual). Always require transport security before attaching credentials. mTLS is the standard for service-to-service identity inside a mesh; user-level authz still rides on a per-call token that an interceptor verifies.

Resilience & Load Balancing

Because every remote call can fail in network-shaped ways, production gRPC leans on a few resilience features — mostly configured, not hand-coded. The non-obvious one is load balancing: gRPC's long-lived, multiplexed connection (§06) breaks naive load balancers, and understanding why is essential to deploying it.

The load-balancing pitfall

An L4 balancer pins every call to one backend

Fixes: an L7 proxy that balances per-request (Envoy/Linkerd), or client-side balancing where the client resolves all backends and round-robins RPCs itself. A plain TCP balancer will funnel an entire connection's traffic to one pod.

Retries, keepalive, health

gRPC supports declarative retries via a service config (a JSON policy attached to the channel): which status codes are retryable (typically UNAVAILABLE), how many attempts, and exponential backoff — no retry loops in your code. Keepalive pings detect dead connections and keep idle ones alive through NATs/proxies. A standard health-checking service lets load balancers and Kubernetes probes ask "are you ready?" Together these are the resilience baseline.

service-config.json — declarative retry policy (language-agnostic)

{
  "methodConfig": [{
    "name": [{ "service": "user.v1.UserService" }],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "2s",
      "backoffMultiplier": 2,
      "retryableStatusCodes": [ "UNAVAILABLE" ]
    }
  }]
}
// Attach to the channel (Go: grpc.WithDefaultServiceConfig(json);
// Python: grpc.insecure_channel(target, options=[("grpc.service_config", json)]))
// Only retry IDEMPOTENT methods automatically — retrying a non-idempotent
// "charge card" can double-charge (same lesson as POST idempotency keys in REST).

Retries need idempotency

Automatic retries are safe only for idempotent methods. Retrying a "create payment" on a timeout can charge twice — the exact danger the REST manual solved with idempotency keys. Mark which methods are safe, and for the rest, use an idempotency key or accept that they aren't auto-retried.

gRPC vs REST — Choosing

Not a rivalry — different tools. The honest decision rule: gRPC for internal, high-throughput, typed service-to-service traffic; REST/JSON for public, browser-facing, human-debuggable, cache-friendly APIs. Most real systems run both: gRPC between backend services, a REST (or GraphQL) edge for the outside world.

Dimension	gRPC	REST / JSON
Payload	Binary protobuf — compact, fast	Text JSON — bulky, human-readable
Contract	Enforced by `.proto`; codegen	Convention + docs (OpenAPI optional)
Transport	HTTP/2 only (multiplexed)	Any HTTP, incl. 1.1
Streaming	First-class, bidirectional	Bolt-ons (SSE, polling, WebSockets)
Browser support	No native — needs gRPC-Web (§19)	Universal
Human-debuggable	Needs tooling (grpcurl)	curl, browser, devtools
HTTP caching	Not really	Mature (ETag, Cache-Control)
Best fit	Microservices, internal, low-latency, polyglot	Public APIs, web/mobile front doors, third parties

The pragmatic architecture

A very common shape: clients hit a REST/JSON gateway over HTTP/1.1; behind it, that gateway and all internal services speak gRPC to each other. You get the public-friendliness of REST at the edge and the speed/typing of gRPC in the core — and tools like grpc-gateway (§19) can even generate the REST edge from the same proto.

gRPC-Web & Browser Interop

A hard constraint, and a frequent surprise: browsers cannot speak native gRPC. The reason is from §06 — gRPC needs fine-grained control over HTTP/2 frames and trailers, and browser fetch/XHR don't expose that. So talking to gRPC from a web frontend requires a translation layer.

Bridging to the browser

A proxy translates between browser-friendly and native gRPC

Your options, roughly in order of how much you want gRPC on the frontend:

gRPC-Web — a variant protocol plus a generated JS/TS client; a proxy (Envoy has a built-in filter) translates it to real gRPC. Streaming is limited (server-streaming works; full bidirectional generally doesn't).
Connect (connectrpc) — a modern protocol family that speaks gRPC, gRPC-Web, and its own HTTP/JSON, often without a separate proxy, from the same handlers. Increasingly the friendliest path.
grpc-gateway / transcoding — generate a REST+JSON facade from your .proto (via HTTP annotations). The browser uses plain REST; the gateway transcodes to gRPC. This is the “REST edge from one proto” pattern referenced in §18.

Plan the edge before committing

If a browser must call your service directly, decide the bridge strategy up front — you can't just point fetch at a gRPC port. For internal-only services this never matters; for anything web-facing it's a first-class design decision.

Observability & Debugging

Binary payloads mean you can't eyeball traffic like JSON (§05), so gRPC ships an ecosystem to see inside. Knowing these turns a gRPC service from a black box into something you can poke, trace, and probe.

Tool / feature	What it gives you
`grpcurl`	The `curl` of gRPC — call methods from the CLI with JSON in/out, list services and methods.
Server reflection	Lets clients/tools discover a server's services & message schemas at runtime — so `grpcurl` works without the `.proto` on hand.
Health checking	A standard `grpc.health.v1.Health` service for readiness/liveness probes (K8s, load balancers).
channelz	Built-in introspection of live channels, connections, and per-RPC stats for debugging connectivity.
Interceptors (§15)	The hook for metrics (Prometheus), structured logs, and distributed tracing (OpenTelemetry) on every call.
Trace propagation	Pass a trace/request ID through metadata (§13) so one request is followable across every service it touches.

grpcurl — debugging from the shell

# List every service the server exposes (needs reflection enabled):
grpcurl localhost:50051 list

# List the methods of one service:
grpcurl localhost:50051 list user.v1.UserService

# Call a unary method with a JSON request — grpcurl turns it into protobuf for you:
grpcurl -d '{"id": "u42"}' localhost:50051 user.v1.UserService/GetUser

# Against a TLS server, drop -plaintext; for local insecure servers, add it:
grpcurl -plaintext -d '{"id":"u42"}' localhost:50051 user.v1.UserService/GetUser

Enable reflection in non-prod

Turn on server reflection in dev/staging so grpcurl and GUI tools (like Postman's gRPC mode or grpcui) can explore your API without you shipping .proto files around. Many teams disable it in production to avoid advertising the schema — a small attack-surface decision.

Debug Cheat-Sheet

The whole manual compressed to what you reach for under pressure.

Concept	One-liner
gRPC	Typed remote method calls — Protobuf contract, binary serialization, HTTP/2 transport.
Protobuf (IDL)	The `.proto` is the contract; both sides generate code from it.
Field numbers	The real wire identity — never change or reuse one; adding fields is safe.
Wire format	tag = `(field«3)\|wiretype`, varint-packed; small + fast, not human-readable.
HTTP/2	One connection, many interleaved streams; one RPC = one stream.
Unary	1→1 — the normal call (~90% of RPCs).
Server stream	1→many — feeds, big result sets.
Client stream	many→1 — uploads, batch aggregation.
Bidi stream	many↔many — chat, realtime, full-duplex.
Channel vs stub	Channel = reused connection (one, shared); stub = cheap per-call caller.
Metadata	Key–value side-channel = HTTP headers; carries tokens & trace IDs.
Deadline	Absolute time bound, propagates across hops — set one on every call.
Status codes	gRPC's own enum (OK, NOT_FOUND, UNAVAILABLE…), in a trailer — not HTTP codes.
Interceptors	gRPC middleware — auth, logging, tracing, recovery; unary & stream.
Security	TLS/mTLS for the pipe + token-in-metadata for the user; never `insecure` in prod.
Load balancing	Use L7 / client-side — an L4 TCP balancer pins everything to one backend.
Retries	Declarative via service config; only auto-retry idempotent methods.
Browser	No native gRPC — use gRPC-Web, Connect, or a REST gateway.
Debug	`grpcurl` + reflection; channelz for connections; health service for probes.
vs REST	gRPC = internal/typed/fast; REST = public/debuggable/cacheable. Run both.

The whole topic in one breath: gRPC lets a client call a server's method as if it were local. You define the API once in a .proto (§04), protoc generates a typed stub and server skeleton (§11), arguments are serialized as compact protobuf (§05) and carried over multiplexed HTTP/2 streams (§06) in one of four shapes — unary, server-, client-, or bidirectional-streaming (§07–10). A long-lived channel hosts cheap per-call stubs (§12); metadata carries auth and trace context, deadlines bound and propagate the call, cancellation reclaims work (§13); every call ends with a gRPC status code (§14); interceptors centralize cross-cutting logic (§15); TLS/mTLS plus per-call tokens secure it (§16); declarative retries and L7/client-side balancing make it resilient (§17). Reach for gRPC inside your system and REST at the public edge (§18–19), and lean on grpcurl, reflection and interceptors to see inside (§20).

Grounded in grpc.io & protobuf.dev docs · MDN for HTTP/2 background · Go 1.22+ (google.golang.org/grpc) / Python 3.11+ (grpcio) examples.