A detailed backend reference

gRPC, calling a remote
function like a local one.

A first-principles walkthrough of the RPC framework that powers the internals of Google, Netflix, Uber and most modern microservice fleets — from the contract (Protocol Buffers) and the binary wire format, through the HTTP/2 transport and the four streaming shapes, to deadlines, interceptors, mTLS and production resilience. Written to explain not just what each piece does but why it exists and how it works underneath. Implementations in both Go and Python.

RPC over HTTP/2 Protocol Buffers Go 1.22+ · Python 3.11+ 21 sections
Part I · Foundations
01

What gRPC Actually Is

gRPC is a framework for calling a function that lives on another machine as if it were a local function in your own process. The name expands to gRPC Remote Procedure Call (a recursive acronym; the "g" has been backronymed to a different word in every release for fun). The phrase that matters is the middle one — Remote Procedure Call. You write user, err := client.GetUser(ctx, &pb.GetUserRequest{Id: 42}) and it looks like an ordinary method call, but under the hood the arguments are serialized, shipped across the network to a server, executed there, and the return value is shipped back — all hidden behind that one line.

Three things are true about gRPC and you need all three in your head at once. It is (1) a contract-first RPC framework: you describe your API in a .proto file and code is generated from it; (2) it serializes data with Protocol Buffers, a compact binary format, instead of text like JSON; and (3) it transports those bytes over HTTP/2, which gives it multiplexing and real bidirectional streaming. Everything else in this manual is a consequence of those three facts.

REST hands you a building with addressable rooms (resources) and a fixed set of things you may do to each room (GET, POST, …). gRPC hands you a remote control with labelled buttons (methods) — you don't think about rooms, you press GetUser or SendPayment and the work happens elsewhere. One models nouns; the other models verbs.
Where gRPC sits
gRPC is three layers stacked: your generated API, Protobuf, and HTTP/2
YOU WORK HERE Your service — generated stubs & servers client.GetUser(ctx, req) · func (s) GetUser(ctx, req) · the methods you call & implement Protocol Buffers — the contract + binary serialization messages → compact bytes · §04–05 HTTP/2 — multiplexed, bidirectional framed transport streams carry the bytes · enables all 4 call types · §06 TCP + TLS — the reliable, encrypted pipe
The top layer is the only part you hand-write. gRPC generates the glue that turns your method calls into Protobuf bytes flowing over HTTP/2 streams.

"Procedure call" is the whole idea

In ordinary code, calling a function is invisible plumbing: you pass arguments, the CPU jumps to the function, it returns a value. RPC asks a simple question — what if that function lived on a different computer? The dream of RPC (which dates back to the 1980s) is to make the network disappear, so a distributed system feels like one program. gRPC is the modern, production-grade realization of that dream: you never write socket code, never parse a response by hand, never build a URL. You call a typed method and handle a typed result or a typed error.

The one-sentence definition

gRPC = typed remote method calls, defined by a Protobuf contract, serialized as compact binary, carried over HTTP/2 streams — so a client in any language can call a server in any other language as if it were local.

02

Why gRPC Exists

gRPC isn't a replacement for REST everywhere — it was built to solve specific pains that show up when many services (often in different languages) talk to each other constantly, at high volume, inside a system. Google built it (open-sourced in 2015, evolved from an internal system called Stubby) precisely because REST-over-JSON was too slow, too loose, and too limited for service-to-service traffic at their scale. Five concrete problems drove its design.

The problem with REST/JSON What gRPC does instead
JSON is bulky & slow. Text, with repeated field names on every object; parsing is CPU-heavy. Protobuf binary: field numbers not names, varint-packed. Often 3–10× smaller and far faster to (de)serialize.
No enforced contract. The shape of a JSON payload lives in docs (or someone's head); drift causes runtime breakage. The .proto file is the contract. Client & server generate code from the same source — mismatches fail at compile time.
Weak streaming. Plain HTTP/1.1 REST is request→response; streaming needs bolt-ons (SSE, polling, raw WebSockets). Four call types including full bidirectional streaming, native to the framework over HTTP/2.
Polyglot friction. Every language hand-writes its own client & serialization, inconsistently. One .proto generates idiomatic clients/servers for Go, Python, Java, C++, Rust, and more.
HTTP/1.1 connection overhead. One request per connection (or head-of-line blocking), repeated handshakes. HTTP/2 multiplexes many concurrent calls over one long-lived connection.

The honest scope

Those strengths are aimed at internal, machine-to-machine communication — microservices, backend-to-backend, mobile-to-backend where you control both ends. gRPC is weaker than REST for public, browser-facing APIs (browsers can't speak raw gRPC — see §19) and for human-debuggable, cache-friendly, broadly-compatible endpoints. The full comparison is §18; for now, hold the frame: gRPC is the internal nervous system; REST is the public front door.

Where you'll meet it

Service meshes (Istio, Linkerd), Kubernetes' own API machinery, etcd, CockroachDB, Envoy's xDS, and the internal call graphs of most large fintech/consumer platforms. If you've sent a payment or streamed a video, gRPC almost certainly carried some hop of that request between backend services.

03

The RPC Mental Model

Before any syntax, internalize the machinery that makes a remote call look local, because every gRPC concept later is just a named part of this picture. A local function call and a remote one differ in exactly one place: between "call" and "execute," the arguments have to cross a network. RPC inserts two pieces of generated code — a stub on the client and a skeleton/handler on the server — to hide that crossing.

The anatomy of a remote call
Stub and skeleton hide the network between "call" and "execute"
CLIENT PROCESS your code client.GetUser(req) STUB (generated) serialize args → bytes send over HTTP/2 SERVER PROCESS real method runs query DB, compute… SKELETON (generated) deserialize bytes → args dispatch to handler network — Protobuf over HTTP/2 request bytes → ← response bytes travel the reverse path, deserialized back into a typed return value
"Marshalling" = turning a language object into bytes; "unmarshalling" = the reverse. The stub marshals the request and unmarshals the reply; the skeleton does the mirror image. You write neither.

The leaky-abstraction warning

RPC tries to make the network invisible — but the network is never truly invisible, and pretending otherwise is the classic RPC trap. A local call can't time out, get lost, or be reordered; a remote one can do all three. That's why gRPC gives first-class tools for the things a local call never needed: deadlines (§13), status codes for partial failure (§14), retries (§17), and cancellation. Treat every remote call as "a local call that can fail in network-shaped ways," and you'll design resilient systems instead of brittle ones.

Don't pretend the wire isn't there

The single biggest mistake with RPC is writing remote calls as if they were free and infallible — no timeout, no error handling for UNAVAILABLE, no thought about latency in a loop. The abstraction is a convenience, not a guarantee. Always pass a context/deadline and always handle the error.

Part II · The Two Pillars
04

Protocol Buffers — the Contract

Protocol Buffers (“protobuf”) is two things wearing one name: an IDL (Interface Definition Language) for describing your messages and services, and a binary serialization format for encoding them on the wire. This section is the IDL half — the .proto file that is the single source of truth both sides generate code from. Get the contract right and the client and server literally cannot disagree about the shape of the data.

A .proto file defines messages (the data structures) and services (collections of RPC methods). Here is a complete, realistic example — a user service — annotated with every rule that matters:

user.proto
syntax = "proto3";              // always declare the syntax; proto3 is current

package user.v1;                // namespace + a versioning convention (v1, v2...)

option go_package = "example.com/gen/userv1;userv1";  // where generated Go lands

// A message is a typed record. Each FIELD has a type, a name, and a NUMBER.
message User {
  string id          = 1;       // the "= 1" is the FIELD NUMBER, not a value
  string email       = 2;
  string full_name   = 3;
  Role   role        = 4;       // a nested enum (declared below)
  repeated string tags = 5;     // "repeated" = a list/array of strings
  int64  created_at  = 6;       // unix seconds; proto3 has no native date
}

// An enum is a fixed set of named values. The first MUST be 0 (the default).
enum Role {
  ROLE_UNSPECIFIED = 0;         // 0 is the implicit default — reserve it
  ROLE_MEMBER      = 1;
  ROLE_ADMIN       = 2;
}

message GetUserRequest  { string id = 1; }
message GetUserResponse { User user = 1; }   // messages nest inside messages

message CreateUserRequest {
  string email     = 1;
  string full_name = 2;
  optional string phone = 3;    // "optional" tracks presence: set vs unset vs ""
}

// A SERVICE is a set of methods. Each takes one message and returns one message.
service UserService {
  rpc GetUser    (GetUserRequest)    returns (GetUserResponse);
  rpc CreateUser (CreateUserRequest) returns (User);
}

The rules that actually bite

  • Field numbers are the real identity, not names. On the wire, email is transmitted as field 2, never as the string "email" (this is why protobuf is compact — §05). The name is for your code; the number is the contract.
  • Never change or reuse a field number. This is the cardinal rule of schema evolution. Renaming a field is safe (names aren't on the wire); changing its number or type silently corrupts data for any peer still using the old definition. To remove a field, mark it reserved so the number can never be accidentally reused.
  • Adding fields is backward-compatible. Give a new field a fresh number; old clients simply don't see it, new servers treat it as unset when an old client omits it. This is how a gRPC API evolves without breaking deployed consumers — the same additive-change discipline you know from REST versioning.
  • proto3 has defaults, not nulls. An unset string is "", an unset int is 0, an unset bool is false. If you must distinguish "absent" from "zero," use optional (which adds presence tracking) — the same trap as Go's zero values, solved the same way (§13 of the REST/handlers manuals echoes this).
Construct Means Notes
scalar int32 int64 uint32 sint32 fixed64 float double bool string bytes Pick sint* for often-negative numbers; bytes for raw binary.
repeated An ordered list of the field's type The protobuf equivalent of an array/slice.
enum A fixed value set; first entry must be 0 Integrity + self-documentation, like a Postgres enum.
oneof At most one of several fields is set A tagged union — e.g. a result that is either a value or an error.
map<k,v> An associative array Sugar over a repeated key/value message.
optional Adds explicit presence tracking to a scalar Distinguishes “unset” from the zero value.
The contract is the API

In REST the contract is prose in a Swagger doc that code may or may not match. In gRPC the .proto is executable truth: both sides generate from it, so a field you added or a type you changed is reflected in both clients and servers the moment they regenerate. The schema can't silently drift from the implementation.

05

How Protobuf Encodes — the Wire Format

The reason gRPC is fast and small comes down to how protobuf turns a message into bytes. You don't hand-write this, but understanding it explains every performance claim and every gotcha. The core trick: each field is written as a tiny tag (which encodes the field number and a wire type) followed by the value — and integers are packed using varints that use fewer bytes for smaller numbers. No field names, no quotes, no commas, no whitespace.

JSON vs Protobuf, same data
Why the binary form is dramatically smaller
JSON — text, names repeated every time {"id":"u42","role":2,"created_at":1717 careful: every key, quote, brace and comma is bytes on the wire ~52 B Protobuf — field numbers + varints, no names 0A 03 u 4 2 20 02 30 …varint ~20 B tag len value · tag value Same information, well under half the size — and no text parsing to decode it.
Multiply this saving across millions of messages per second and you see why high-traffic internal systems reach for protobuf over JSON.

The tag: field number + wire type in one byte (usually)

Each field on the wire begins with a tag computed as (field_number << 3) | wire_type. The low 3 bits are the wire type (how to read the bytes that follow: varint, 64-bit, length-delimited, 32-bit); the rest is the field number. So the decoder reads the tag, learns "this is field 2, and it's length-delimited," and knows exactly how to consume what comes next — even if it has never seen that field before (it can skip unknown fields, which is what makes forward-compatibility work).

Decoding one field
How a tag byte tells the reader what to do next
tag byte = (field_number << 3) | wire_type field number upper bits wire type 3 bits 0 = VARINT — int32/64, bool, enum 1 = I64 — fixed64, double 2 = LEN — string, bytes, messages 5 = I32 — fixed32, float varint: small numbers cost fewer bytes 1 → 1 byte  ·  300 → 2 bytes  ·  70000 → 3 bytes each byte uses 7 bits for data + 1 “continue” bit, so values grow their encoding only as needed
Because the reader can identify and skip a field it doesn't recognize, a new server can add fields without breaking old clients — the wire format is self-describing enough to step over the unknown.
The cost of binary: it isn't human-readable

You can't curl a gRPC endpoint and eyeball the JSON. Debugging needs tools that understand the schema (grpcurl, reflection — §20). That opacity is the price of the speed and size; it's the main reason public/debuggable APIs often stay on REST.

06

HTTP/2 — the Transport Underneath

gRPC doesn't invent its own transport — it rides on HTTP/2, and almost every gRPC superpower (concurrency, all four streaming shapes, low overhead) is really an HTTP/2 feature. The thing to understand is HTTP/2's central idea: a single TCP connection is divided into many independent, interleaved streams, each carrying a sequence of binary frames. That's what lets one connection carry hundreds of concurrent RPCs — and lets data flow in both directions at once.

Multiplexing
One connection, many concurrent RPCs interleaved as frames
HTTP/1.1 — one call blocks the line (or needs many connections) call A call B waits call C waits HTTP/2 — one connection, 4 streams interleaved ABCD frames from different calls share the wire, in any order No head-of-line blocking at the HTTP layer: a slow call never freezes the others on the connection.
A gRPC call maps to one HTTP/2 stream. Many calls → many streams → one connection. This is why a gRPC client keeps a long-lived connection and reuses it.

How a gRPC call maps onto HTTP/2

Concretely, an RPC is an HTTP/2 POST to a path shaped like /package.Service/Method (e.g. /user.v1.UserService/GetUser). gRPC metadata travels as HTTP/2 headers; the serialized protobuf travels in DATA frames as the body; and the final status (the gRPC status code — §14) arrives in HTTP/2 trailers after the body. Streaming simply means more than one message flows in one or both directions on that stream before it closes.

gRPC concept HTTP/2 mechanism
One RPC call One HTTP/2 stream (request + response)
Method being called :path header = /pkg.Service/Method
Metadata (auth tokens, trace IDs) HTTP/2 request & response headers
The request/response message(s) length-prefixed protobuf in DATA frames
Final status + message grpc-status / grpc-message trailers
Streaming multiple DATA frames before the stream half-closes
Why this matters for you

Because gRPC needs HTTP/2 end to end, any proxy/load balancer in the path must speak HTTP/2 and do L7 (request-aware) balancing — a naive L4 TCP balancer will pin every call to one backend, since they all share one connection (§17). And because browsers can't expose raw HTTP/2 framing to JS, browsers can't speak native gRPC at all (§19). The transport choice shapes the whole deployment story.

Part III · The Four Call Types
07

Unary RPC

The simplest and most common shape: one request, one response — exactly like a normal function call. The client sends a single message, the server does its work and returns a single message. ~90% of real-world RPCs are unary. We'll build the GetUser method from the user.proto in §04, in full, both languages.

Unary
One message each way
Client Server 1 request 1 response
// ===== SERVER =====
package main

import (
    "context"
    "log"
    "net"

    "google.golang.org/grpc"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
    userv1 "example.com/gen/userv1" // generated from user.proto
)

// Embed the generated UnimplementedUserServiceServer for forward-compat.
type server struct {
    userv1.UnimplementedUserServiceServer
}

// The method signature is generated FROM the proto: ctx, *Request -> *Response, error
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
    if req.GetId() == "" {
        return nil, status.Error(codes.InvalidArgument, "id is required") // typed error, §14
    }
    // ...real work: query the DB by req.GetId()...
    u := &userv1.User{Id: req.GetId(), Email: "ada@example.com", FullName: "Ada", Role: userv1.Role_ROLE_ADMIN}
    return &userv1.GetUserResponse{User: u}, nil
}

func main() {
    lis, _ := net.Listen("tcp", ":50051")
    s := grpc.NewServer()
    userv1.RegisterUserServiceServer(s, &server{}) // wire the impl to the service
    log.Println("gRPC on :50051")
    s.Serve(lis)
}

// ===== CLIENT =====
func callGetUser() {
    // NewClient replaces the deprecated grpc.Dial; insecure creds for local dev only
    conn, _ := grpc.NewClient("localhost:50051", grpc.WithTransportCredentials(insecure.NewCredentials()))
    defer conn.Close()

    client := userv1.NewUserServiceClient(conn) // the generated STUB
    ctx, cancel := context.WithTimeout(context.Background(), time.Second) // always a deadline, §13
    defer cancel()

    resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "u42"}) // looks local, runs remote
    if err != nil {
        log.Fatalf("GetUser failed: %v", err) // err carries the gRPC status code
    }
    log.Printf("got user: %s", resp.GetUser().GetFullName())
}
# ===== SERVER =====
from concurrent import futures
import grpc
import user_pb2 as pb          # generated: messages
import user_pb2_grpc as pb_grpc  # generated: service base classes

class UserService(pb_grpc.UserServiceServicer):
    # Method signature generated FROM the proto: (self, request, context) -> response
    def GetUser(self, request, context):
        if not request.id:
            context.abort(grpc.StatusCode.INVALID_ARGUMENT, "id is required")  # typed error, §14
        # ...real work: query the DB by request.id...
        user = pb.User(id=request.id, email="ada@example.com",
                       full_name="Ada", role=pb.ROLE_ADMIN)
        return pb.GetUserResponse(user=user)

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    pb_grpc.add_UserServiceServicer_to_server(UserService(), server)  # wire impl to service
    server.add_insecure_port("[::]:50051")  # insecure = local dev only
    server.start()
    print("gRPC on :50051")
    server.wait_for_termination()

# ===== CLIENT =====
def call_get_user():
    with grpc.insecure_channel("localhost:50051") as channel:
        stub = pb_grpc.UserServiceStub(channel)        # the generated STUB
        try:
            # timeout= is the deadline, §13; looks local, runs remote
            resp = stub.GetUser(pb.GetUserRequest(id="u42"), timeout=1.0)
            print("got user:", resp.user.full_name)
        except grpc.RpcError as e:
            print("failed:", e.code(), e.details())     # carries the gRPC status code

if __name__ == "__main__":
    serve()

Notice the symmetry across languages: a generated stub on the client, a generated servicer/server base class on the server, and a method whose exact signature came from the proto.

08

Server Streaming

One request, a stream of responses. The client asks once; the server sends back many messages over time, then closes the stream. Perfect for: returning a large result set in chunks, a live feed of events, progress updates on a long job, or paginating without repeated round trips. The proto marks the response as stream.

Server streaming
Ask once, receive many
Client Server 1 request many responses, then close
// proto:  rpc ListUsers(ListUsersRequest) returns (stream User);

// ===== SERVER: receive one req, call stream.Send(...) repeatedly =====
func (s *server) ListUsers(req *userv1.ListUsersRequest, stream userv1.UserService_ListUsersServer) error {
    for _, u := range queryUsers(req.GetFilter()) { // imagine this yields a big result set
        if err := stream.Send(u); err != nil {       // push one message down the stream
            return err                                // client gone / cancelled
        }
    }
    return nil // returning nil closes the stream cleanly (sends OK trailer)
}

// ===== CLIENT: call once, then Recv() in a loop until io.EOF =====
func listUsers(client userv1.UserServiceClient) {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    stream, _ := client.ListUsers(ctx, &userv1.ListUsersRequest{Filter: "active"})
    for {
        u, err := stream.Recv()
        if err == io.EOF { break } // server closed the stream — we're done
        if err != nil { log.Fatal(err) }
        log.Printf("user: %s", u.GetFullName())
    }
}
# proto:  rpc ListUsers(ListUsersRequest) returns (stream User);

# ===== SERVER: a generator — every `yield` sends one message =====
class UserService(pb_grpc.UserServiceServicer):
    def ListUsers(self, request, context):
        for u in query_users(request.filter):  # big result set
            yield u                             # yielding pushes one message down the stream
        # function returning ends the stream cleanly

# ===== CLIENT: the call returns an ITERABLE of responses =====
def list_users(stub):
    responses = stub.ListUsers(pb.ListUsersRequest(filter="active"), timeout=10.0)
    for u in responses:        # iterate until the server closes the stream
        print("user:", u.full_name)

Go uses an explicit stream.Send/stream.Recv pair; Python expresses the server side as a generator (yield) and the client side as a plain iterable — idiomatic to each language, same wire behavior.

09

Client Streaming

A stream of requests, one response. The client sends many messages, then the server replies once with a summary/result. Ideal for uploads, batch ingestion, or aggregating a series of readings into a single computed answer. The proto marks the request as stream.

Client streaming
Send many, get one back
Client Server many requests, then close 1 response
// proto:  rpc UploadEvents(stream Event) returns (UploadSummary);

// ===== SERVER: Recv() in a loop, then SendAndClose() once at the end =====
func (s *server) UploadEvents(stream userv1.UserService_UploadEventsServer) error {
    count := 0
    for {
        ev, err := stream.Recv()
        if err == io.EOF { // client finished sending — now reply once
            return stream.SendAndClose(&userv1.UploadSummary{Received: int32(count)})
        }
        if err != nil { return err }
        store(ev)
        count++
    }
}

// ===== CLIENT: Send() many, then CloseAndRecv() for the single reply =====
func uploadEvents(client userv1.UserServiceClient, events []*userv1.Event) {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    stream, _ := client.UploadEvents(ctx)
    for _, ev := range events {
        stream.Send(ev) // push each one up
    }
    summary, err := stream.CloseAndRecv() // close our side, await the summary
    if err != nil { log.Fatal(err) }
    log.Printf("server received %d events", summary.GetReceived())
}
# proto:  rpc UploadEvents(stream Event) returns (UploadSummary);

# ===== SERVER: request_iterator yields incoming messages; return once =====
class UserService(pb_grpc.UserServiceServicer):
    def UploadEvents(self, request_iterator, context):
        count = 0
        for ev in request_iterator:   # iterate the client's stream
            store(ev)
            count += 1
        return pb.UploadSummary(received=count)  # single reply after client closes

# ===== CLIENT: pass an iterable/generator of requests; get one response =====
def upload_events(stub, events):
    def gen():
        for ev in events:
            yield ev
    summary = stub.UploadEvents(gen(), timeout=30.0)  # returns the single summary
    print("server received", summary.received, "events")

The asymmetry is the point: SendAndClose/CloseAndRecv in Go, a request_iterator plus a normal return in Python — the server consumes the whole request stream before producing its one answer.

10

Bidirectional Streaming

A stream of requests and a stream of responses, simultaneously and independently. Both sides read and write on the same stream at the same time, in any order — this is what HTTP/2's full-duplex framing (§06) buys you. It's the shape behind chat, real-time collaboration, live telemetry with control messages, and interactive sessions. Both request and response are stream in the proto.

Bidirectional streaming
Both sides send and receive at once
Client Server interleaved & independent — neither side waits its turn
// proto:  rpc Chat(stream ChatMessage) returns (stream ChatMessage);

// ===== SERVER: loop Recv() and Send() on the same stream =====
func (s *server) Chat(stream userv1.UserService_ChatServer) error {
    for {
        msg, err := stream.Recv()
        if err == io.EOF { return nil } // client closed its send side
        if err != nil { return err }
        // echo back (or broadcast to a room, etc.) — can Send anytime, any number
        reply := &userv1.ChatMessage{User: "server", Text: "ack: " + msg.GetText()}
        if err := stream.Send(reply); err != nil { return err }
    }
}

// ===== CLIENT: typically Send in one goroutine, Recv in another =====
func chat(client userv1.UserServiceClient) {
    stream, _ := client.Chat(context.Background())
    go func() { // concurrent receiver
        for {
            in, err := stream.Recv()
            if err != nil { return }
            log.Printf("<< %s", in.GetText())
        }
    }()
    for _, t := range []string{"hi", "how are you", "bye"} {
        stream.Send(&userv1.ChatMessage{User: "ada", Text: t}) // send concurrently
    }
    stream.CloseSend() // signal we're done sending; receiver drains the rest
}
# proto:  rpc Chat(stream ChatMessage) returns (stream ChatMessage);

# ===== SERVER: iterate incoming, yield outgoing — interleaved =====
class UserService(pb_grpc.UserServiceServicer):
    def Chat(self, request_iterator, context):
        for msg in request_iterator:          # read the client's stream
            yield pb.ChatMessage(user="server", text="ack: " + msg.text)  # write back

# ===== CLIENT: pass a request generator; iterate the response stream =====
def chat(stub):
    def outgoing():
        for t in ["hi", "how are you", "bye"]:
            yield pb.ChatMessage(user="ada", text=t)
    responses = stub.Chat(outgoing())   # both directions live at once
    for reply in responses:
        print("<<", reply.text)

# (for true concurrency under load, prefer the async API: grpc.aio)
Streaming gotchas to respect

Streams are long-lived, so they consume a connection/goroutine for their lifetime — always set deadlines or idle limits, and handle the client vanishing mid-stream. There's no automatic back-pressure knob beyond HTTP/2 flow control, so a fast producer can overwhelm a slow consumer if you don't pace it. And a single stream is ordered but not a transaction — if it breaks halfway, you've delivered a prefix, so design messages to be resumable or idempotent.

The four shapes at a glance
Pick by how many messages flow each way
Unary 1 → 1 normal call Server stream 1 → many feeds, large results Client stream many → 1 uploads, batch Bidirectional many ↔ many chat, realtime All four ride the same HTTP/2 stream machinery; the proto's two stream keywords decide the shape. rpc M(Req) returns (Res) | (stream Req) | returns (stream Res) | both
Part IV · The Machinery
11

The .proto → Code Workflow

You never write the stubs and skeletons by hand — a compiler, protoc (or the buf toolchain), reads your .proto and emits idiomatic source for each target language via a language-specific plugin. This is the step that turns the contract into callable code, and it's the reason a Go client and a Python server can interoperate flawlessly: both were generated from the same file.

Code generation
One contract, many generated clients and servers
user.proto the contract protoc + lang plugins (protoc-gen-go, grpc_tools) Go: *.pb.go messages + stub + server base Py: *_pb2*.py messages + stub + servicer Regenerate after every proto change — the generated files are build artifacts, committed or generated in CI.
// Install the two plugins once (they sit on your PATH):
//   go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
//   go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

// Generate messages (--go_out) AND the service stubs/skeleton (--go-grpc_out):
//   protoc \
//     --go_out=. --go_opt=paths=source_relative \
//     --go-grpc_out=. --go-grpc_opt=paths=source_relative \
//     user.proto

// Produces:
//   user.pb.go        — structs for each message (+ getters)
//   user_grpc.pb.go   — UserServiceClient (stub) + UserServiceServer (to implement)

// Then in code you simply import the generated package:
//   import userv1 "example.com/gen/userv1"
# Install the tooling once:
#   pip install grpcio grpcio-tools

# Generate messages AND service stubs in one command:
#   python -m grpc_tools.protoc \
#       -I. \
#       --python_out=. \
#       --grpc_python_out=. \
#       user.proto

# Produces:
#   user_pb2.py       — message classes
#   user_pb2_grpc.py  — UserServiceStub (client) + UserServiceServicer (to implement)

# Then import both in your code:
#   import user_pb2 as pb
#   import user_pb2_grpc as pb_grpc
Use buf in real projects

Raw protoc invocations get unwieldy fast. The buf toolchain wraps it with a config file, dependency management, breaking-change detection (it fails CI if you'd violate the never-reuse-a-field-number rule), and a linter. For anything beyond a toy, prefer buf — it turns the schema-evolution discipline from §04 into an automated gate.

12

Channels, Stubs & a Call End-to-End

Two client-side objects matter, and people conflate them. A channel (Go calls it a ClientConn) is the long-lived, reusable connection to a server — it manages the underlying HTTP/2 connection(s), reconnection, and load-balancing state. A stub is the cheap, generated object you create on top of a channel to actually call methods. The rule: create the channel once and share it; create stubs freely. Opening a channel per request destroys performance — you throw away the connection reuse that was the whole point of HTTP/2.

A unary call, end to end
Every step from your method call to the typed reply
1 · you call stub.GetUser(ctx, req) 2 · stub serializes req → protobuf 3 · client interceptors run 4 · HTTP/2 stream → :path, DATA 5 · server reads frames 6 · deserialize + interceptors 7 · YOUR handler runs 8 · serialize reply + status The reply retraces the path: serialized on the server, framed over the same stream, deserialized by the stub, and handed back to you as a typed value — or a typed error carrying a status code. channel = the reused connection · stub = the cheap per-call caller on top of it
The performance rule

One channel per server, shared across your whole app, for its whole lifetime. Stubs are throwaway. If your latency is mysteriously bad, the first thing to check is whether you're creating a channel (and thus a fresh HTTP/2 + TLS handshake) on every call.

13

Metadata, Deadlines & Cancellation

These are the tools that acknowledge the network is real (the §03 warning, made concrete). Metadata is gRPC's key–value side-channel — the equivalent of HTTP headers — for things that aren't the message itself: auth tokens, trace/request IDs, API versions. Deadlines put an absolute time bound on a call and, crucially, propagate across hops. Cancellation lets a caller (or a broken connection) abort in-flight work so servers don't toil on results nobody wants.

Deadlines beat timeouts — and they propagate

A gRPC deadline is an absolute point in time, not a per-hop duration. When service A calls B with a 1-second deadline and B calls C, the remaining budget travels along — so C knows it has, say, 600ms left, not a fresh second. This prevents the classic cascade where each layer waits its own full timeout and total latency balloons. Always set a deadline on every call. A call with no deadline can hang forever, pinning resources.

// ===== CLIENT: attach a deadline + metadata =====
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) // absolute deadline
defer cancel() // cancel frees resources whether we time out or finish early

ctx = metadata.AppendToOutgoingContext(ctx,
    "authorization", "Bearer "+token,   // auth travels as metadata, §16
    "x-request-id", reqID)              // trace id for correlation, like §15 of the layers manual

resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "u42"})
if status.Code(err) == codes.DeadlineExceeded {
    log.Println("call timed out") // a network-shaped failure a local call never had
}

// ===== SERVER: read metadata, respect the inherited deadline =====
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
    md, _ := metadata.FromIncomingContext(ctx)
    auth := md.Get("authorization") // verify token here (or in an interceptor, §15)

    // ctx already carries the client's remaining deadline + cancellation —
    // pass it straight to the DB driver so slow work is abandoned automatically.
    if ctx.Err() != nil { return nil, status.FromContextError(ctx.Err()).Err() }
    _ = auth
    return s.lookup(ctx, req.GetId())
}
# ===== CLIENT: attach a deadline (timeout=) + metadata =====
metadata = (
    ("authorization", f"Bearer {token}"),  # auth as metadata, §16
    ("x-request-id", req_id),               # trace id for correlation
)
try:
    resp = stub.GetUser(pb.GetUserRequest(id="u42"),
                        timeout=1.0,         # the deadline, in seconds
                        metadata=metadata)
except grpc.RpcError as e:
    if e.code() == grpc.StatusCode.DEADLINE_EXCEEDED:
        print("call timed out")             # a network-shaped failure

# ===== SERVER: read metadata, check for cancellation =====
class UserService(pb_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        md = dict(context.invocation_metadata())
        auth = md.get("authorization")       # verify token (or in an interceptor, §15)

        if not context.is_active():          # client gone / deadline passed?
            return pb.GetUserResponse()      # abandon the work
        return self.lookup(request.id)
Pass the context down, always

In Go, thread the incoming ctx into every downstream call (DB queries, outbound RPCs). That's what makes deadlines and cancellation actually work — if the client hangs up, the cancellation ripples all the way down and frees everything. A handler that ignores ctx keeps grinding on abandoned work.

14

Status Codes & Error Handling

gRPC does not use HTTP status codes. It has its own fixed set of status codes — an enum of ~16 values — sent in the grpc-status trailer (§06). Every call ends with exactly one: OK for success, or one of the error codes with an optional message. Returning the right code is part of your contract, because clients (and retry policies — §17) switch on it.

Code Means Closest HTTP analogue
OK Success 200
INVALID_ARGUMENT Client sent bad input (independent of system state) 400
UNAUTHENTICATED No / invalid credentials 401
PERMISSION_DENIED Authenticated but not allowed 403
NOT_FOUND The requested entity doesn't exist 404
ALREADY_EXISTS Create conflicts with an existing entity 409
FAILED_PRECONDITION System not in a state for the operation 400/409
RESOURCE_EXHAUSTED Quota / rate limit hit 429
DEADLINE_EXCEEDED Call ran past its deadline 504
UNAVAILABLE Transient — server down/overloaded; safe to retry 503
INTERNAL A real bug / invariant broken 500
UNIMPLEMENTED Method not implemented on this server 501
import (
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

// SERVER: return a typed status, not a bare error string.
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
    if req.GetId() == "" {
        return nil, status.Error(codes.InvalidArgument, "id is required")
    }
    u, found := s.db.Find(req.GetId())
    if !found {
        return nil, status.Errorf(codes.NotFound, "no user with id %q", req.GetId())
    }
    return &userv1.GetUserResponse{User: u}, nil
}

// CLIENT: inspect the code to decide what to do.
resp, err := client.GetUser(ctx, req)
if err != nil {
    st := status.Convert(err)        // pull the status out of the error
    switch st.Code() {
    case codes.NotFound:        // expected — show "not found" in UI
    case codes.Unavailable:     // transient — retry with backoff, §17
    default:                    log.Printf("unexpected: %v: %s", st.Code(), st.Message())
    }
}
import grpc

# SERVER: abort with a code + message (or set_code/set_details then return).
class UserService(pb_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        if not request.id:
            context.abort(grpc.StatusCode.INVALID_ARGUMENT, "id is required")
        user = self.db.find(request.id)
        if user is None:
            context.abort(grpc.StatusCode.NOT_FOUND, f"no user with id {request.id!r}")
        return pb.GetUserResponse(user=user)

# CLIENT: catch RpcError and switch on .code()
try:
    resp = stub.GetUser(req, timeout=1.0)
except grpc.RpcError as e:
    if e.code() == grpc.StatusCode.NOT_FOUND:
        ...   # expected — show "not found"
    elif e.code() == grpc.StatusCode.UNAVAILABLE:
        ...   # transient — retry with backoff, §17
    else:
        print("unexpected:", e.code(), e.details())
Rich, structured errors

When a code + message isn't enough (e.g. per-field validation details, like the REST error envelope), gRPC supports error details — typed protobuf messages attached to the status (the google.rpc types: BadRequest, QuotaFailure, RetryInfo…). The client deserializes them as structured data, never by parsing a human string — same principle as the machine-readable code field in the REST manual's error envelope.

15

Interceptors — the Middleware of gRPC

If you read the handlers/services/middleware chapter, this is the exact same idea with a different name. An interceptor is a function that wraps every RPC, running before and/or after your handler — the place to centralize cross-cutting concerns so you don't repeat them in every method: authentication, logging, metrics, tracing, panic recovery, rate limiting. They come in two flavours: unary interceptors (wrap one-shot calls) and stream interceptors (wrap streaming calls), on both the client and the server side.

The interceptor chain
Cross-cutting logic wraps the handler, just like HTTP middleware
incoming RPC → recovercatch panics logging+ metrics, trace authverify token your handlerbusiness logic Order matters (recover outermost, auth nearest the handler) — the same ordering discipline as HTTP middleware. ← response unwinds back through each interceptor (logging records status, recover guards the return)
// A unary server interceptor: signature is fixed by the framework.
func authInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler) (any, error) {

    md, _ := metadata.FromIncomingContext(ctx)
    tokens := md.Get("authorization")
    if len(tokens) == 0 || !valid(tokens[0]) {
        return nil, status.Error(codes.Unauthenticated, "missing or invalid token")
    }
    // attach the verified identity for the handler to read from ctx
    ctx = context.WithValue(ctx, userKey{}, parse(tokens[0]))
    return handler(ctx, req) // call the next link / the real handler
}

func loggingInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler) (any, error) {
    start := time.Now()
    resp, err := handler(ctx, req)
    log.Printf("%s took %s -> %s", info.FullMethod, time.Since(start), status.Code(err))
    return resp, err
}

// Register the chain when building the server (outermost listed first):
s := grpc.NewServer(
    grpc.ChainUnaryInterceptor(loggingInterceptor, authInterceptor),
)
import grpc

class AuthInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        md = dict(handler_call_details.invocation_metadata)
        token = md.get("authorization", "")
        if not valid(token):
            # short-circuit: abort before the handler ever runs
            def deny(request, context):
                context.abort(grpc.StatusCode.UNAUTHENTICATED, "missing or invalid token")
            return grpc.unary_unary_rpc_method_handler(deny)
        return continuation(handler_call_details)  # proceed to next / handler

# Register interceptors when building the server (applied in order):
server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[AuthInterceptor()],
)

# Client-side interceptors also exist (e.g. to inject auth on every call):
#   channel = grpc.intercept_channel(base_channel, MyClientInterceptor())

This is the gRPC home for everything the layers manual put in middleware: auth, logging, tracing, rate limiting, panic recovery — written once, applied to every method.

Part V · Production & Interop
16

Authentication & Security

gRPC security splits into two orthogonal questions: channel security (is the connection encrypted and is the peer who they claim to be? → TLS / mTLS) and call credentials (who is the caller for this request? → a token in metadata). You combine them: TLS protects the pipe, a per-call token identifies the user. The insecure credentials used in earlier examples are for local development only — never ship them.

Layer Mechanism Answers
Transport TLS — server presents a cert encrypted? + is the server genuine?
Transport mTLS — both sides present certs + is the client service genuine? (service-to-service identity)
Per-call token in metadata (JWT / OAuth bearer) which user is making this call? (authn/authz, verified in an interceptor §15)
import "google.golang.org/grpc/credentials"

// ===== SERVER over TLS =====
creds, _ := credentials.NewServerTLSFromFile("server.crt", "server.key")
s := grpc.NewServer(grpc.Creds(creds)) // every connection is now encrypted

// ===== CLIENT over TLS =====
tlsCreds := credentials.NewTLS(&tls.Config{RootCAs: pool}) // trust this CA
conn, _ := grpc.NewClient("api.example.com:443", grpc.WithTransportCredentials(tlsCreds))

// ===== Per-call token (combine with TLS) =====
// Implement credentials.PerRPCCredentials so a fresh token rides on every call:
type tokenCreds struct{ token string }
func (t tokenCreds) GetRequestMetadata(ctx context.Context, _ ...string) (map[string]string, error) {
    return map[string]string{"authorization": "Bearer " + t.token}, nil
}
func (t tokenCreds) RequireTransportSecurity() bool { return true } // refuse to send token in cleartext

conn, _ = grpc.NewClient("api.example.com:443",
    grpc.WithTransportCredentials(tlsCreds),
    grpc.WithPerRPCCredentials(tokenCreds{token}),
)
# ===== SERVER over TLS =====
with open("server.key", "rb") as k, open("server.crt", "rb") as c:
    creds = grpc.ssl_server_credentials([(k.read(), c.read())])
server.add_secure_port("[::]:443", creds)   # encrypted port

# ===== CLIENT over TLS =====
with open("ca.crt", "rb") as f:
    channel_creds = grpc.ssl_channel_credentials(root_certificates=f.read())

# ===== Per-call token, composed with the channel credentials =====
class TokenAuth(grpc.AuthMetadataPlugin):
    def __init__(self, token): self.token = token
    def __call__(self, context, callback):
        callback((("authorization", f"Bearer {self.token}"),), None)  # adds metadata per call

call_creds = grpc.metadata_call_credentials(TokenAuth(token))
composite  = grpc.composite_channel_credentials(channel_creds, call_creds)
channel    = grpc.secure_channel("api.example.com:443", composite)
Never send tokens over plaintext

Bearer tokens are like passwords — whoever holds one can impersonate the caller (echoing the JWT-theft warning from the auth manual). Always require transport security before attaching credentials. mTLS is the standard for service-to-service identity inside a mesh; user-level authz still rides on a per-call token that an interceptor verifies.

17

Resilience & Load Balancing

Because every remote call can fail in network-shaped ways, production gRPC leans on a few resilience features — mostly configured, not hand-coded. The non-obvious one is load balancing: gRPC's long-lived, multiplexed connection (§06) breaks naive load balancers, and understanding why is essential to deploying it.

The load-balancing pitfall
An L4 balancer pins every call to one backend
L4 (TCP) balancer — WRONG for gRPC client L4 LBper-conn backend 1 (hot) backend 2 idle all multiplexed calls land on one box L7 / client-side balancing — RIGHT client L7 / meshper-RPC backend 1 backend 2 backend 3 individual RPCs spread across backends
Fixes: an L7 proxy that balances per-request (Envoy/Linkerd), or client-side balancing where the client resolves all backends and round-robins RPCs itself. A plain TCP balancer will funnel an entire connection's traffic to one pod.

Retries, keepalive, health

gRPC supports declarative retries via a service config (a JSON policy attached to the channel): which status codes are retryable (typically UNAVAILABLE), how many attempts, and exponential backoff — no retry loops in your code. Keepalive pings detect dead connections and keep idle ones alive through NATs/proxies. A standard health-checking service lets load balancers and Kubernetes probes ask "are you ready?" Together these are the resilience baseline.

service-config.json — declarative retry policy (language-agnostic)
{
  "methodConfig": [{
    "name": [{ "service": "user.v1.UserService" }],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "2s",
      "backoffMultiplier": 2,
      "retryableStatusCodes": [ "UNAVAILABLE" ]
    }
  }]
}
// Attach to the channel (Go: grpc.WithDefaultServiceConfig(json);
// Python: grpc.insecure_channel(target, options=[("grpc.service_config", json)]))
// Only retry IDEMPOTENT methods automatically — retrying a non-idempotent
// "charge card" can double-charge (same lesson as POST idempotency keys in REST).
Retries need idempotency

Automatic retries are safe only for idempotent methods. Retrying a "create payment" on a timeout can charge twice — the exact danger the REST manual solved with idempotency keys. Mark which methods are safe, and for the rest, use an idempotency key or accept that they aren't auto-retried.

18

gRPC vs REST — Choosing

Not a rivalry — different tools. The honest decision rule: gRPC for internal, high-throughput, typed service-to-service traffic; REST/JSON for public, browser-facing, human-debuggable, cache-friendly APIs. Most real systems run both: gRPC between backend services, a REST (or GraphQL) edge for the outside world.

Dimension gRPC REST / JSON
Payload Binary protobuf — compact, fast Text JSON — bulky, human-readable
Contract Enforced by .proto; codegen Convention + docs (OpenAPI optional)
Transport HTTP/2 only (multiplexed) Any HTTP, incl. 1.1
Streaming First-class, bidirectional Bolt-ons (SSE, polling, WebSockets)
Browser support No native — needs gRPC-Web (§19) Universal
Human-debuggable Needs tooling (grpcurl) curl, browser, devtools
HTTP caching Not really Mature (ETag, Cache-Control)
Best fit Microservices, internal, low-latency, polyglot Public APIs, web/mobile front doors, third parties
The pragmatic architecture

A very common shape: clients hit a REST/JSON gateway over HTTP/1.1; behind it, that gateway and all internal services speak gRPC to each other. You get the public-friendliness of REST at the edge and the speed/typing of gRPC in the core — and tools like grpc-gateway (§19) can even generate the REST edge from the same proto.

19

gRPC-Web & Browser Interop

A hard constraint, and a frequent surprise: browsers cannot speak native gRPC. The reason is from §06 — gRPC needs fine-grained control over HTTP/2 frames and trailers, and browser fetch/XHR don't expose that. So talking to gRPC from a web frontend requires a translation layer.

Bridging to the browser
A proxy translates between browser-friendly and native gRPC
BrowsergRPC-Web client ProxyEnvoy / gateway gRPC servicenative gRPC gRPC-WebHTTP/1.1-friendly native gRPCHTTP/2 + trailers

Your options, roughly in order of how much you want gRPC on the frontend:

  • gRPC-Web — a variant protocol plus a generated JS/TS client; a proxy (Envoy has a built-in filter) translates it to real gRPC. Streaming is limited (server-streaming works; full bidirectional generally doesn't).
  • Connect (connectrpc) — a modern protocol family that speaks gRPC, gRPC-Web, and its own HTTP/JSON, often without a separate proxy, from the same handlers. Increasingly the friendliest path.
  • grpc-gateway / transcoding — generate a REST+JSON facade from your .proto (via HTTP annotations). The browser uses plain REST; the gateway transcodes to gRPC. This is the “REST edge from one proto” pattern referenced in §18.
Plan the edge before committing

If a browser must call your service directly, decide the bridge strategy up front — you can't just point fetch at a gRPC port. For internal-only services this never matters; for anything web-facing it's a first-class design decision.

20

Observability & Debugging

Binary payloads mean you can't eyeball traffic like JSON (§05), so gRPC ships an ecosystem to see inside. Knowing these turns a gRPC service from a black box into something you can poke, trace, and probe.

Tool / feature What it gives you
grpcurl The curl of gRPC — call methods from the CLI with JSON in/out, list services and methods.
Server reflection Lets clients/tools discover a server's services & message schemas at runtime — so grpcurl works without the .proto on hand.
Health checking A standard grpc.health.v1.Health service for readiness/liveness probes (K8s, load balancers).
channelz Built-in introspection of live channels, connections, and per-RPC stats for debugging connectivity.
Interceptors (§15) The hook for metrics (Prometheus), structured logs, and distributed tracing (OpenTelemetry) on every call.
Trace propagation Pass a trace/request ID through metadata (§13) so one request is followable across every service it touches.
grpcurl — debugging from the shell
# List every service the server exposes (needs reflection enabled):
grpcurl localhost:50051 list

# List the methods of one service:
grpcurl localhost:50051 list user.v1.UserService

# Call a unary method with a JSON request — grpcurl turns it into protobuf for you:
grpcurl -d '{"id": "u42"}' localhost:50051 user.v1.UserService/GetUser

# Against a TLS server, drop -plaintext; for local insecure servers, add it:
grpcurl -plaintext -d '{"id":"u42"}' localhost:50051 user.v1.UserService/GetUser
Enable reflection in non-prod

Turn on server reflection in dev/staging so grpcurl and GUI tools (like Postman's gRPC mode or grpcui) can explore your API without you shipping .proto files around. Many teams disable it in production to avoid advertising the schema — a small attack-surface decision.

21

Debug Cheat-Sheet

The whole manual compressed to what you reach for under pressure.

Concept One-liner
gRPC Typed remote method calls — Protobuf contract, binary serialization, HTTP/2 transport.
Protobuf (IDL) The .proto is the contract; both sides generate code from it.
Field numbers The real wire identity — never change or reuse one; adding fields is safe.
Wire format tag = (field«3)|wiretype, varint-packed; small + fast, not human-readable.
HTTP/2 One connection, many interleaved streams; one RPC = one stream.
Unary 1→1 — the normal call (~90% of RPCs).
Server stream 1→many — feeds, big result sets.
Client stream many→1 — uploads, batch aggregation.
Bidi stream many↔many — chat, realtime, full-duplex.
Channel vs stub Channel = reused connection (one, shared); stub = cheap per-call caller.
Metadata Key–value side-channel = HTTP headers; carries tokens & trace IDs.
Deadline Absolute time bound, propagates across hops — set one on every call.
Status codes gRPC's own enum (OK, NOT_FOUND, UNAVAILABLE…), in a trailer — not HTTP codes.
Interceptors gRPC middleware — auth, logging, tracing, recovery; unary & stream.
Security TLS/mTLS for the pipe + token-in-metadata for the user; never insecure in prod.
Load balancing Use L7 / client-side — an L4 TCP balancer pins everything to one backend.
Retries Declarative via service config; only auto-retry idempotent methods.
Browser No native gRPC — use gRPC-Web, Connect, or a REST gateway.
Debug grpcurl + reflection; channelz for connections; health service for probes.
vs REST gRPC = internal/typed/fast; REST = public/debuggable/cacheable. Run both.

The whole topic in one breath: gRPC lets a client call a server's method as if it were local. You define the API once in a .proto (§04), protoc generates a typed stub and server skeleton (§11), arguments are serialized as compact protobuf (§05) and carried over multiplexed HTTP/2 streams (§06) in one of four shapes — unary, server-, client-, or bidirectional-streaming (§07–10). A long-lived channel hosts cheap per-call stubs (§12); metadata carries auth and trace context, deadlines bound and propagate the call, cancellation reclaims work (§13); every call ends with a gRPC status code (§14); interceptors centralize cross-cutting logic (§15); TLS/mTLS plus per-call tokens secure it (§16); declarative retries and L7/client-side balancing make it resilient (§17). Reach for gRPC inside your system and REST at the public edge (§18–19), and lean on grpcurl, reflection and interceptors to see inside (§20).

Grounded in grpc.io & protobuf.dev docs · MDN for HTTP/2 background · Go 1.22+ (google.golang.org/grpc) / Python 3.11+ (grpcio) examples.