gRPC, calling a remote
function like a local one.
A first-principles walkthrough of the RPC framework that powers the internals of Google, Netflix, Uber and most modern microservice fleets — from the contract (Protocol Buffers) and the binary wire format, through the HTTP/2 transport and the four streaming shapes, to deadlines, interceptors, mTLS and production resilience. Written to explain not just what each piece does but why it exists and how it works underneath. Implementations in both Go and Python.
What gRPC Actually Is
gRPC is a framework for calling a function that lives on another machine as if it were a
local
function in your own process. The name expands to gRPC Remote Procedure Call
(a recursive
acronym; the "g" has been backronymed to a different word in every release for fun). The phrase that
matters
is the middle one — Remote Procedure Call. You write user, err :=
client.GetUser(ctx, &pb.GetUserRequest{Id: 42}) and it looks like an ordinary method
call,
but under the hood the arguments are serialized, shipped across the network to a server, executed
there, and
the return value is shipped back — all hidden behind that one line.
Three things are true about gRPC and you need all three in your head at once. It is (1) a
contract-first
RPC framework: you describe your API in a .proto file and code is
generated from it; (2)
it serializes data with Protocol Buffers, a compact binary format, instead of text
like JSON;
and (3) it transports those bytes over HTTP/2, which gives it multiplexing and real
bidirectional streaming. Everything else in this manual is a consequence of those three facts.
GetUser or
SendPayment
and the work happens elsewhere. One models nouns; the other models verbs."Procedure call" is the whole idea
In ordinary code, calling a function is invisible plumbing: you pass arguments, the CPU jumps to the function, it returns a value. RPC asks a simple question — what if that function lived on a different computer? The dream of RPC (which dates back to the 1980s) is to make the network disappear, so a distributed system feels like one program. gRPC is the modern, production-grade realization of that dream: you never write socket code, never parse a response by hand, never build a URL. You call a typed method and handle a typed result or a typed error.
gRPC = typed remote method calls, defined by a Protobuf contract, serialized as compact binary, carried over HTTP/2 streams — so a client in any language can call a server in any other language as if it were local.
Why gRPC Exists
gRPC isn't a replacement for REST everywhere — it was built to solve specific pains that show up when many services (often in different languages) talk to each other constantly, at high volume, inside a system. Google built it (open-sourced in 2015, evolved from an internal system called Stubby) precisely because REST-over-JSON was too slow, too loose, and too limited for service-to-service traffic at their scale. Five concrete problems drove its design.
| The problem with REST/JSON | What gRPC does instead |
|---|---|
| JSON is bulky & slow. Text, with repeated field names on every object; parsing is CPU-heavy. | Protobuf binary: field numbers not names, varint-packed. Often 3–10× smaller and far faster to (de)serialize. |
| No enforced contract. The shape of a JSON payload lives in docs (or someone's head); drift causes runtime breakage. | The .proto file is the contract. Client & server generate
code from the same source — mismatches fail at compile time. |
| Weak streaming. Plain HTTP/1.1 REST is request→response; streaming needs bolt-ons (SSE, polling, raw WebSockets). | Four call types including full bidirectional streaming, native to the framework over HTTP/2. |
| Polyglot friction. Every language hand-writes its own client & serialization, inconsistently. | One .proto generates idiomatic clients/servers for Go, Python, Java,
C++, Rust, and more. |
| HTTP/1.1 connection overhead. One request per connection (or head-of-line blocking), repeated handshakes. | HTTP/2 multiplexes many concurrent calls over one long-lived connection. |
The honest scope
Those strengths are aimed at internal, machine-to-machine communication — microservices, backend-to-backend, mobile-to-backend where you control both ends. gRPC is weaker than REST for public, browser-facing APIs (browsers can't speak raw gRPC — see §19) and for human-debuggable, cache-friendly, broadly-compatible endpoints. The full comparison is §18; for now, hold the frame: gRPC is the internal nervous system; REST is the public front door.
Service meshes (Istio, Linkerd), Kubernetes' own API machinery, etcd, CockroachDB, Envoy's xDS, and the internal call graphs of most large fintech/consumer platforms. If you've sent a payment or streamed a video, gRPC almost certainly carried some hop of that request between backend services.
The RPC Mental Model
Before any syntax, internalize the machinery that makes a remote call look local, because every gRPC concept later is just a named part of this picture. A local function call and a remote one differ in exactly one place: between "call" and "execute," the arguments have to cross a network. RPC inserts two pieces of generated code — a stub on the client and a skeleton/handler on the server — to hide that crossing.
The leaky-abstraction warning
RPC tries to make the network invisible — but the network is never truly invisible, and pretending otherwise is the classic RPC trap. A local call can't time out, get lost, or be reordered; a remote one can do all three. That's why gRPC gives first-class tools for the things a local call never needed: deadlines (§13), status codes for partial failure (§14), retries (§17), and cancellation. Treat every remote call as "a local call that can fail in network-shaped ways," and you'll design resilient systems instead of brittle ones.
The single biggest mistake with RPC is writing remote calls as if they were free and infallible
— no
timeout, no error handling for UNAVAILABLE, no thought about latency in a loop. The
abstraction
is a convenience, not a guarantee. Always pass a context/deadline and always handle the error.
Protocol Buffers — the Contract
Protocol Buffers (“protobuf”) is two things wearing one name: an IDL
(Interface
Definition Language) for describing your messages and services, and a binary
serialization
format for encoding them on the wire. This section is the IDL half — the
.proto file that is the single source of truth both sides generate code from. Get the
contract
right and the client and server literally cannot disagree about the shape of the data.
A .proto file defines messages (the data structures) and
services
(collections of RPC methods). Here is a complete, realistic example — a user service —
annotated
with every rule that matters:
syntax = "proto3"; // always declare the syntax; proto3 is current
package user.v1; // namespace + a versioning convention (v1, v2...)
option go_package = "example.com/gen/userv1;userv1"; // where generated Go lands
// A message is a typed record. Each FIELD has a type, a name, and a NUMBER.
message User {
string id = 1; // the "= 1" is the FIELD NUMBER, not a value
string email = 2;
string full_name = 3;
Role role = 4; // a nested enum (declared below)
repeated string tags = 5; // "repeated" = a list/array of strings
int64 created_at = 6; // unix seconds; proto3 has no native date
}
// An enum is a fixed set of named values. The first MUST be 0 (the default).
enum Role {
ROLE_UNSPECIFIED = 0; // 0 is the implicit default — reserve it
ROLE_MEMBER = 1;
ROLE_ADMIN = 2;
}
message GetUserRequest { string id = 1; }
message GetUserResponse { User user = 1; } // messages nest inside messages
message CreateUserRequest {
string email = 1;
string full_name = 2;
optional string phone = 3; // "optional" tracks presence: set vs unset vs ""
}
// A SERVICE is a set of methods. Each takes one message and returns one message.
service UserService {
rpc GetUser (GetUserRequest) returns (GetUserResponse);
rpc CreateUser (CreateUserRequest) returns (User);
}
The rules that actually bite
- Field numbers are the real identity, not names. On the wire,
emailis transmitted as field2, never as the string "email" (this is why protobuf is compact — §05). The name is for your code; the number is the contract. - Never change or reuse a field number. This is the cardinal rule of schema
evolution. Renaming
a field is safe (names aren't on the wire); changing its number or type
silently corrupts
data for any peer still using the old definition. To remove a field, mark it
reservedso the number can never be accidentally reused. - Adding fields is backward-compatible. Give a new field a fresh number; old clients simply don't see it, new servers treat it as unset when an old client omits it. This is how a gRPC API evolves without breaking deployed consumers — the same additive-change discipline you know from REST versioning.
- proto3 has defaults, not nulls. An unset
stringis"", an unsetintis0, an unsetboolisfalse. If you must distinguish "absent" from "zero," useoptional(which adds presence tracking) — the same trap as Go's zero values, solved the same way (§13 of the REST/handlers manuals echoes this).
| Construct | Means | Notes |
|---|---|---|
| scalar | int32 int64 uint32 sint32 fixed64 float double bool string bytes |
Pick sint* for often-negative numbers; bytes for raw
binary. |
| repeated | An ordered list of the field's type | The protobuf equivalent of an array/slice. |
| enum | A fixed value set; first entry must be 0 |
Integrity + self-documentation, like a Postgres enum. |
| oneof | At most one of several fields is set | A tagged union — e.g. a result that is either a value or an error. |
| map<k,v> | An associative array | Sugar over a repeated key/value message. |
| optional | Adds explicit presence tracking to a scalar | Distinguishes “unset” from the zero value. |
In REST the contract is prose in a Swagger doc that code may or may not match. In gRPC the
.proto is executable truth: both sides generate from it, so a field you
added or a type
you changed is reflected in both clients and servers the moment they regenerate. The schema
can't silently
drift from the implementation.
How Protobuf Encodes — the Wire Format
The reason gRPC is fast and small comes down to how protobuf turns a message into bytes. You don't hand-write this, but understanding it explains every performance claim and every gotcha. The core trick: each field is written as a tiny tag (which encodes the field number and a wire type) followed by the value — and integers are packed using varints that use fewer bytes for smaller numbers. No field names, no quotes, no commas, no whitespace.
The tag: field number + wire type in one byte (usually)
Each field on the wire begins with a tag computed as
(field_number << 3) | wire_type. The
low 3 bits are the wire type (how to read the bytes that follow: varint, 64-bit,
length-delimited,
32-bit); the rest is the field number. So the decoder reads the tag, learns "this
is field 2,
and it's length-delimited," and knows exactly how to consume what comes next — even if it has
never seen
that field before (it can skip unknown fields, which is what makes forward-compatibility work).
You can't curl a gRPC endpoint and eyeball the JSON. Debugging needs tools that
understand the
schema (grpcurl, reflection — §20). That opacity is the price of the
speed and size;
it's the main reason public/debuggable APIs often stay on REST.
HTTP/2 — the Transport Underneath
gRPC doesn't invent its own transport — it rides on HTTP/2, and almost every gRPC superpower (concurrency, all four streaming shapes, low overhead) is really an HTTP/2 feature. The thing to understand is HTTP/2's central idea: a single TCP connection is divided into many independent, interleaved streams, each carrying a sequence of binary frames. That's what lets one connection carry hundreds of concurrent RPCs — and lets data flow in both directions at once.
How a gRPC call maps onto HTTP/2
Concretely, an RPC is an HTTP/2 POST to a path shaped like
/package.Service/Method
(e.g. /user.v1.UserService/GetUser). gRPC metadata travels as HTTP/2
headers; the
serialized protobuf travels in DATA frames as the body; and the final status (the
gRPC status
code — §14) arrives in HTTP/2 trailers after the body. Streaming simply
means more
than one message flows in one or both directions on that stream before it closes.
| gRPC concept | HTTP/2 mechanism |
|---|---|
| One RPC call | One HTTP/2 stream (request + response) |
| Method being called | :path header = /pkg.Service/Method |
| Metadata (auth tokens, trace IDs) | HTTP/2 request & response headers |
| The request/response message(s) | length-prefixed protobuf in DATA frames |
| Final status + message | grpc-status / grpc-message trailers |
| Streaming | multiple DATA frames before the stream half-closes |
Because gRPC needs HTTP/2 end to end, any proxy/load balancer in the path must speak HTTP/2 and do L7 (request-aware) balancing — a naive L4 TCP balancer will pin every call to one backend, since they all share one connection (§17). And because browsers can't expose raw HTTP/2 framing to JS, browsers can't speak native gRPC at all (§19). The transport choice shapes the whole deployment story.
Unary RPC
The simplest and most common shape: one request, one response — exactly like a
normal
function call. The client sends a single message, the server does its work and returns a single
message. ~90%
of real-world RPCs are unary. We'll build the GetUser method from the
user.proto in
§04, in full, both languages.
// ===== SERVER =====
package main
import (
"context"
"log"
"net"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
userv1 "example.com/gen/userv1" // generated from user.proto
)
// Embed the generated UnimplementedUserServiceServer for forward-compat.
type server struct {
userv1.UnimplementedUserServiceServer
}
// The method signature is generated FROM the proto: ctx, *Request -> *Response, error
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
if req.GetId() == "" {
return nil, status.Error(codes.InvalidArgument, "id is required") // typed error, §14
}
// ...real work: query the DB by req.GetId()...
u := &userv1.User{Id: req.GetId(), Email: "ada@example.com", FullName: "Ada", Role: userv1.Role_ROLE_ADMIN}
return &userv1.GetUserResponse{User: u}, nil
}
func main() {
lis, _ := net.Listen("tcp", ":50051")
s := grpc.NewServer()
userv1.RegisterUserServiceServer(s, &server{}) // wire the impl to the service
log.Println("gRPC on :50051")
s.Serve(lis)
}
// ===== CLIENT =====
func callGetUser() {
// NewClient replaces the deprecated grpc.Dial; insecure creds for local dev only
conn, _ := grpc.NewClient("localhost:50051", grpc.WithTransportCredentials(insecure.NewCredentials()))
defer conn.Close()
client := userv1.NewUserServiceClient(conn) // the generated STUB
ctx, cancel := context.WithTimeout(context.Background(), time.Second) // always a deadline, §13
defer cancel()
resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "u42"}) // looks local, runs remote
if err != nil {
log.Fatalf("GetUser failed: %v", err) // err carries the gRPC status code
}
log.Printf("got user: %s", resp.GetUser().GetFullName())
}
# ===== SERVER =====
from concurrent import futures
import grpc
import user_pb2 as pb # generated: messages
import user_pb2_grpc as pb_grpc # generated: service base classes
class UserService(pb_grpc.UserServiceServicer):
# Method signature generated FROM the proto: (self, request, context) -> response
def GetUser(self, request, context):
if not request.id:
context.abort(grpc.StatusCode.INVALID_ARGUMENT, "id is required") # typed error, §14
# ...real work: query the DB by request.id...
user = pb.User(id=request.id, email="ada@example.com",
full_name="Ada", role=pb.ROLE_ADMIN)
return pb.GetUserResponse(user=user)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
pb_grpc.add_UserServiceServicer_to_server(UserService(), server) # wire impl to service
server.add_insecure_port("[::]:50051") # insecure = local dev only
server.start()
print("gRPC on :50051")
server.wait_for_termination()
# ===== CLIENT =====
def call_get_user():
with grpc.insecure_channel("localhost:50051") as channel:
stub = pb_grpc.UserServiceStub(channel) # the generated STUB
try:
# timeout= is the deadline, §13; looks local, runs remote
resp = stub.GetUser(pb.GetUserRequest(id="u42"), timeout=1.0)
print("got user:", resp.user.full_name)
except grpc.RpcError as e:
print("failed:", e.code(), e.details()) # carries the gRPC status code
if __name__ == "__main__":
serve()
Notice the symmetry across languages: a generated stub on the client, a generated servicer/server base class on the server, and a method whose exact signature came from the proto.
Server Streaming
One request, a stream of responses. The client asks once; the server sends back many
messages
over time, then closes the stream. Perfect for: returning a large result set in chunks, a live feed
of events,
progress updates on a long job, or paginating without repeated round trips. The proto marks the
response as stream.
// proto: rpc ListUsers(ListUsersRequest) returns (stream User);
// ===== SERVER: receive one req, call stream.Send(...) repeatedly =====
func (s *server) ListUsers(req *userv1.ListUsersRequest, stream userv1.UserService_ListUsersServer) error {
for _, u := range queryUsers(req.GetFilter()) { // imagine this yields a big result set
if err := stream.Send(u); err != nil { // push one message down the stream
return err // client gone / cancelled
}
}
return nil // returning nil closes the stream cleanly (sends OK trailer)
}
// ===== CLIENT: call once, then Recv() in a loop until io.EOF =====
func listUsers(client userv1.UserServiceClient) {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
stream, _ := client.ListUsers(ctx, &userv1.ListUsersRequest{Filter: "active"})
for {
u, err := stream.Recv()
if err == io.EOF { break } // server closed the stream — we're done
if err != nil { log.Fatal(err) }
log.Printf("user: %s", u.GetFullName())
}
}
# proto: rpc ListUsers(ListUsersRequest) returns (stream User);
# ===== SERVER: a generator — every `yield` sends one message =====
class UserService(pb_grpc.UserServiceServicer):
def ListUsers(self, request, context):
for u in query_users(request.filter): # big result set
yield u # yielding pushes one message down the stream
# function returning ends the stream cleanly
# ===== CLIENT: the call returns an ITERABLE of responses =====
def list_users(stub):
responses = stub.ListUsers(pb.ListUsersRequest(filter="active"), timeout=10.0)
for u in responses: # iterate until the server closes the stream
print("user:", u.full_name)
Go uses an explicit stream.Send/stream.Recv pair; Python
expresses
the server side as a generator (yield) and the client side as a plain
iterable
— idiomatic to each language, same wire behavior.
Client Streaming
A stream of requests, one response. The client sends many messages, then the server
replies
once with a summary/result. Ideal for uploads, batch ingestion, or aggregating a series of readings
into a
single computed answer. The proto marks the request as stream.
// proto: rpc UploadEvents(stream Event) returns (UploadSummary);
// ===== SERVER: Recv() in a loop, then SendAndClose() once at the end =====
func (s *server) UploadEvents(stream userv1.UserService_UploadEventsServer) error {
count := 0
for {
ev, err := stream.Recv()
if err == io.EOF { // client finished sending — now reply once
return stream.SendAndClose(&userv1.UploadSummary{Received: int32(count)})
}
if err != nil { return err }
store(ev)
count++
}
}
// ===== CLIENT: Send() many, then CloseAndRecv() for the single reply =====
func uploadEvents(client userv1.UserServiceClient, events []*userv1.Event) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
stream, _ := client.UploadEvents(ctx)
for _, ev := range events {
stream.Send(ev) // push each one up
}
summary, err := stream.CloseAndRecv() // close our side, await the summary
if err != nil { log.Fatal(err) }
log.Printf("server received %d events", summary.GetReceived())
}
# proto: rpc UploadEvents(stream Event) returns (UploadSummary);
# ===== SERVER: request_iterator yields incoming messages; return once =====
class UserService(pb_grpc.UserServiceServicer):
def UploadEvents(self, request_iterator, context):
count = 0
for ev in request_iterator: # iterate the client's stream
store(ev)
count += 1
return pb.UploadSummary(received=count) # single reply after client closes
# ===== CLIENT: pass an iterable/generator of requests; get one response =====
def upload_events(stub, events):
def gen():
for ev in events:
yield ev
summary = stub.UploadEvents(gen(), timeout=30.0) # returns the single summary
print("server received", summary.received, "events")
The asymmetry is the point: SendAndClose/CloseAndRecv in
Go, a
request_iterator plus a normal return in Python — the server
consumes the whole
request stream before producing its one answer.
Bidirectional Streaming
A stream of requests and a stream of responses, simultaneously and independently.
Both sides
read and write on the same stream at the same time, in any order — this is what HTTP/2's
full-duplex
framing (§06) buys you. It's the shape behind chat, real-time collaboration, live telemetry
with control
messages, and interactive sessions. Both request and response are stream in the proto.
// proto: rpc Chat(stream ChatMessage) returns (stream ChatMessage);
// ===== SERVER: loop Recv() and Send() on the same stream =====
func (s *server) Chat(stream userv1.UserService_ChatServer) error {
for {
msg, err := stream.Recv()
if err == io.EOF { return nil } // client closed its send side
if err != nil { return err }
// echo back (or broadcast to a room, etc.) — can Send anytime, any number
reply := &userv1.ChatMessage{User: "server", Text: "ack: " + msg.GetText()}
if err := stream.Send(reply); err != nil { return err }
}
}
// ===== CLIENT: typically Send in one goroutine, Recv in another =====
func chat(client userv1.UserServiceClient) {
stream, _ := client.Chat(context.Background())
go func() { // concurrent receiver
for {
in, err := stream.Recv()
if err != nil { return }
log.Printf("<< %s", in.GetText())
}
}()
for _, t := range []string{"hi", "how are you", "bye"} {
stream.Send(&userv1.ChatMessage{User: "ada", Text: t}) // send concurrently
}
stream.CloseSend() // signal we're done sending; receiver drains the rest
}
# proto: rpc Chat(stream ChatMessage) returns (stream ChatMessage);
# ===== SERVER: iterate incoming, yield outgoing — interleaved =====
class UserService(pb_grpc.UserServiceServicer):
def Chat(self, request_iterator, context):
for msg in request_iterator: # read the client's stream
yield pb.ChatMessage(user="server", text="ack: " + msg.text) # write back
# ===== CLIENT: pass a request generator; iterate the response stream =====
def chat(stub):
def outgoing():
for t in ["hi", "how are you", "bye"]:
yield pb.ChatMessage(user="ada", text=t)
responses = stub.Chat(outgoing()) # both directions live at once
for reply in responses:
print("<<", reply.text)
# (for true concurrency under load, prefer the async API: grpc.aio)
Streams are long-lived, so they consume a connection/goroutine for their lifetime — always set deadlines or idle limits, and handle the client vanishing mid-stream. There's no automatic back-pressure knob beyond HTTP/2 flow control, so a fast producer can overwhelm a slow consumer if you don't pace it. And a single stream is ordered but not a transaction — if it breaks halfway, you've delivered a prefix, so design messages to be resumable or idempotent.
The .proto → Code Workflow
You never write the stubs and skeletons by hand — a compiler, protoc (or the buf
toolchain),
reads your .proto and emits idiomatic source for each target language via a
language-specific
plugin. This is the step that turns the contract into callable code, and it's the
reason a Go
client and a Python server can interoperate flawlessly: both were generated from the same file.
// Install the two plugins once (they sit on your PATH):
// go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
// go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
// Generate messages (--go_out) AND the service stubs/skeleton (--go-grpc_out):
// protoc \
// --go_out=. --go_opt=paths=source_relative \
// --go-grpc_out=. --go-grpc_opt=paths=source_relative \
// user.proto
// Produces:
// user.pb.go — structs for each message (+ getters)
// user_grpc.pb.go — UserServiceClient (stub) + UserServiceServer (to implement)
// Then in code you simply import the generated package:
// import userv1 "example.com/gen/userv1"
# Install the tooling once:
# pip install grpcio grpcio-tools
# Generate messages AND service stubs in one command:
# python -m grpc_tools.protoc \
# -I. \
# --python_out=. \
# --grpc_python_out=. \
# user.proto
# Produces:
# user_pb2.py — message classes
# user_pb2_grpc.py — UserServiceStub (client) + UserServiceServicer (to implement)
# Then import both in your code:
# import user_pb2 as pb
# import user_pb2_grpc as pb_grpc
Raw protoc invocations get unwieldy fast. The buf toolchain wraps
it with a
config file, dependency management, breaking-change detection (it fails CI if you'd
violate the
never-reuse-a-field-number rule), and a linter. For anything beyond a toy, prefer buf — it
turns the
schema-evolution discipline from §04 into an automated gate.
Channels, Stubs & a Call End-to-End
Two client-side objects matter, and people conflate them. A channel (Go calls it a
ClientConn) is the long-lived, reusable connection to a server — it manages the
underlying
HTTP/2 connection(s), reconnection, and load-balancing state. A stub is the cheap,
generated
object you create on top of a channel to actually call methods. The rule: create
the channel
once and share it; create stubs freely. Opening a channel per request destroys
performance — you
throw away the connection reuse that was the whole point of HTTP/2.
One channel per server, shared across your whole app, for its whole lifetime. Stubs are throwaway. If your latency is mysteriously bad, the first thing to check is whether you're creating a channel (and thus a fresh HTTP/2 + TLS handshake) on every call.
Metadata, Deadlines & Cancellation
These are the tools that acknowledge the network is real (the §03 warning, made concrete). Metadata is gRPC's key–value side-channel — the equivalent of HTTP headers — for things that aren't the message itself: auth tokens, trace/request IDs, API versions. Deadlines put an absolute time bound on a call and, crucially, propagate across hops. Cancellation lets a caller (or a broken connection) abort in-flight work so servers don't toil on results nobody wants.
Deadlines beat timeouts — and they propagate
A gRPC deadline is an absolute point in time, not a per-hop duration. When service A calls B with a 1-second deadline and B calls C, the remaining budget travels along — so C knows it has, say, 600ms left, not a fresh second. This prevents the classic cascade where each layer waits its own full timeout and total latency balloons. Always set a deadline on every call. A call with no deadline can hang forever, pinning resources.
// ===== CLIENT: attach a deadline + metadata =====
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) // absolute deadline
defer cancel() // cancel frees resources whether we time out or finish early
ctx = metadata.AppendToOutgoingContext(ctx,
"authorization", "Bearer "+token, // auth travels as metadata, §16
"x-request-id", reqID) // trace id for correlation, like §15 of the layers manual
resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "u42"})
if status.Code(err) == codes.DeadlineExceeded {
log.Println("call timed out") // a network-shaped failure a local call never had
}
// ===== SERVER: read metadata, respect the inherited deadline =====
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
md, _ := metadata.FromIncomingContext(ctx)
auth := md.Get("authorization") // verify token here (or in an interceptor, §15)
// ctx already carries the client's remaining deadline + cancellation —
// pass it straight to the DB driver so slow work is abandoned automatically.
if ctx.Err() != nil { return nil, status.FromContextError(ctx.Err()).Err() }
_ = auth
return s.lookup(ctx, req.GetId())
}
# ===== CLIENT: attach a deadline (timeout=) + metadata =====
metadata = (
("authorization", f"Bearer {token}"), # auth as metadata, §16
("x-request-id", req_id), # trace id for correlation
)
try:
resp = stub.GetUser(pb.GetUserRequest(id="u42"),
timeout=1.0, # the deadline, in seconds
metadata=metadata)
except grpc.RpcError as e:
if e.code() == grpc.StatusCode.DEADLINE_EXCEEDED:
print("call timed out") # a network-shaped failure
# ===== SERVER: read metadata, check for cancellation =====
class UserService(pb_grpc.UserServiceServicer):
def GetUser(self, request, context):
md = dict(context.invocation_metadata())
auth = md.get("authorization") # verify token (or in an interceptor, §15)
if not context.is_active(): # client gone / deadline passed?
return pb.GetUserResponse() # abandon the work
return self.lookup(request.id)
In Go, thread the incoming ctx into every downstream call (DB queries,
outbound RPCs).
That's what makes deadlines and cancellation actually work — if the client hangs up, the
cancellation
ripples all the way down and frees everything. A handler that ignores ctx keeps
grinding on
abandoned work.
Status Codes & Error Handling
gRPC does not use HTTP status codes. It has its own fixed set of status codes
— an enum
of ~16 values — sent in the grpc-status trailer (§06). Every call ends with
exactly one:
OK for success, or one of the error codes with an optional message. Returning the
right
code is part of your contract, because clients (and retry policies — §17) switch on it.
| Code | Means | Closest HTTP analogue |
|---|---|---|
| OK | Success | 200 |
| INVALID_ARGUMENT | Client sent bad input (independent of system state) | 400 |
| UNAUTHENTICATED | No / invalid credentials | 401 |
| PERMISSION_DENIED | Authenticated but not allowed | 403 |
| NOT_FOUND | The requested entity doesn't exist | 404 |
| ALREADY_EXISTS | Create conflicts with an existing entity | 409 |
| FAILED_PRECONDITION | System not in a state for the operation | 400/409 |
| RESOURCE_EXHAUSTED | Quota / rate limit hit | 429 |
| DEADLINE_EXCEEDED | Call ran past its deadline | 504 |
| UNAVAILABLE | Transient — server down/overloaded; safe to retry | 503 |
| INTERNAL | A real bug / invariant broken | 500 |
| UNIMPLEMENTED | Method not implemented on this server | 501 |
import (
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
// SERVER: return a typed status, not a bare error string.
func (s *server) GetUser(ctx context.Context, req *userv1.GetUserRequest) (*userv1.GetUserResponse, error) {
if req.GetId() == "" {
return nil, status.Error(codes.InvalidArgument, "id is required")
}
u, found := s.db.Find(req.GetId())
if !found {
return nil, status.Errorf(codes.NotFound, "no user with id %q", req.GetId())
}
return &userv1.GetUserResponse{User: u}, nil
}
// CLIENT: inspect the code to decide what to do.
resp, err := client.GetUser(ctx, req)
if err != nil {
st := status.Convert(err) // pull the status out of the error
switch st.Code() {
case codes.NotFound: // expected — show "not found" in UI
case codes.Unavailable: // transient — retry with backoff, §17
default: log.Printf("unexpected: %v: %s", st.Code(), st.Message())
}
}
import grpc
# SERVER: abort with a code + message (or set_code/set_details then return).
class UserService(pb_grpc.UserServiceServicer):
def GetUser(self, request, context):
if not request.id:
context.abort(grpc.StatusCode.INVALID_ARGUMENT, "id is required")
user = self.db.find(request.id)
if user is None:
context.abort(grpc.StatusCode.NOT_FOUND, f"no user with id {request.id!r}")
return pb.GetUserResponse(user=user)
# CLIENT: catch RpcError and switch on .code()
try:
resp = stub.GetUser(req, timeout=1.0)
except grpc.RpcError as e:
if e.code() == grpc.StatusCode.NOT_FOUND:
... # expected — show "not found"
elif e.code() == grpc.StatusCode.UNAVAILABLE:
... # transient — retry with backoff, §17
else:
print("unexpected:", e.code(), e.details())
When a code + message isn't enough (e.g. per-field validation details, like the REST error
envelope), gRPC
supports error details — typed protobuf messages attached to the status
(the
google.rpc types: BadRequest, QuotaFailure,
RetryInfo…).
The client deserializes them as structured data, never by parsing a human string — same
principle as the
machine-readable code field in the REST manual's error envelope.
Interceptors — the Middleware of gRPC
If you read the handlers/services/middleware chapter, this is the exact same idea with a different name. An interceptor is a function that wraps every RPC, running before and/or after your handler — the place to centralize cross-cutting concerns so you don't repeat them in every method: authentication, logging, metrics, tracing, panic recovery, rate limiting. They come in two flavours: unary interceptors (wrap one-shot calls) and stream interceptors (wrap streaming calls), on both the client and the server side.
// A unary server interceptor: signature is fixed by the framework.
func authInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler) (any, error) {
md, _ := metadata.FromIncomingContext(ctx)
tokens := md.Get("authorization")
if len(tokens) == 0 || !valid(tokens[0]) {
return nil, status.Error(codes.Unauthenticated, "missing or invalid token")
}
// attach the verified identity for the handler to read from ctx
ctx = context.WithValue(ctx, userKey{}, parse(tokens[0]))
return handler(ctx, req) // call the next link / the real handler
}
func loggingInterceptor(ctx context.Context, req any, info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler) (any, error) {
start := time.Now()
resp, err := handler(ctx, req)
log.Printf("%s took %s -> %s", info.FullMethod, time.Since(start), status.Code(err))
return resp, err
}
// Register the chain when building the server (outermost listed first):
s := grpc.NewServer(
grpc.ChainUnaryInterceptor(loggingInterceptor, authInterceptor),
)
import grpc
class AuthInterceptor(grpc.ServerInterceptor):
def intercept_service(self, continuation, handler_call_details):
md = dict(handler_call_details.invocation_metadata)
token = md.get("authorization", "")
if not valid(token):
# short-circuit: abort before the handler ever runs
def deny(request, context):
context.abort(grpc.StatusCode.UNAUTHENTICATED, "missing or invalid token")
return grpc.unary_unary_rpc_method_handler(deny)
return continuation(handler_call_details) # proceed to next / handler
# Register interceptors when building the server (applied in order):
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=10),
interceptors=[AuthInterceptor()],
)
# Client-side interceptors also exist (e.g. to inject auth on every call):
# channel = grpc.intercept_channel(base_channel, MyClientInterceptor())
This is the gRPC home for everything the layers manual put in middleware: auth, logging, tracing, rate limiting, panic recovery — written once, applied to every method.
Authentication & Security
gRPC security splits into two orthogonal questions: channel security (is the
connection
encrypted and is the peer who they claim to be? → TLS / mTLS) and call
credentials (who
is the caller for this request? → a token in metadata). You combine them: TLS protects
the pipe,
a per-call token identifies the user. The insecure credentials used in earlier examples
are for
local development only — never ship them.
| Layer | Mechanism | Answers |
|---|---|---|
| Transport | TLS — server presents a cert | encrypted? + is the server genuine? |
| Transport | mTLS — both sides present certs | + is the client service genuine? (service-to-service identity) |
| Per-call | token in metadata (JWT / OAuth bearer) | which user is making this call? (authn/authz, verified in an interceptor §15) |
import "google.golang.org/grpc/credentials"
// ===== SERVER over TLS =====
creds, _ := credentials.NewServerTLSFromFile("server.crt", "server.key")
s := grpc.NewServer(grpc.Creds(creds)) // every connection is now encrypted
// ===== CLIENT over TLS =====
tlsCreds := credentials.NewTLS(&tls.Config{RootCAs: pool}) // trust this CA
conn, _ := grpc.NewClient("api.example.com:443", grpc.WithTransportCredentials(tlsCreds))
// ===== Per-call token (combine with TLS) =====
// Implement credentials.PerRPCCredentials so a fresh token rides on every call:
type tokenCreds struct{ token string }
func (t tokenCreds) GetRequestMetadata(ctx context.Context, _ ...string) (map[string]string, error) {
return map[string]string{"authorization": "Bearer " + t.token}, nil
}
func (t tokenCreds) RequireTransportSecurity() bool { return true } // refuse to send token in cleartext
conn, _ = grpc.NewClient("api.example.com:443",
grpc.WithTransportCredentials(tlsCreds),
grpc.WithPerRPCCredentials(tokenCreds{token}),
)
# ===== SERVER over TLS =====
with open("server.key", "rb") as k, open("server.crt", "rb") as c:
creds = grpc.ssl_server_credentials([(k.read(), c.read())])
server.add_secure_port("[::]:443", creds) # encrypted port
# ===== CLIENT over TLS =====
with open("ca.crt", "rb") as f:
channel_creds = grpc.ssl_channel_credentials(root_certificates=f.read())
# ===== Per-call token, composed with the channel credentials =====
class TokenAuth(grpc.AuthMetadataPlugin):
def __init__(self, token): self.token = token
def __call__(self, context, callback):
callback((("authorization", f"Bearer {self.token}"),), None) # adds metadata per call
call_creds = grpc.metadata_call_credentials(TokenAuth(token))
composite = grpc.composite_channel_credentials(channel_creds, call_creds)
channel = grpc.secure_channel("api.example.com:443", composite)
Bearer tokens are like passwords — whoever holds one can impersonate the caller (echoing the JWT-theft warning from the auth manual). Always require transport security before attaching credentials. mTLS is the standard for service-to-service identity inside a mesh; user-level authz still rides on a per-call token that an interceptor verifies.
Resilience & Load Balancing
Because every remote call can fail in network-shaped ways, production gRPC leans on a few resilience features — mostly configured, not hand-coded. The non-obvious one is load balancing: gRPC's long-lived, multiplexed connection (§06) breaks naive load balancers, and understanding why is essential to deploying it.
Retries, keepalive, health
gRPC supports declarative retries via a service config (a JSON policy
attached to
the channel): which status codes are retryable (typically UNAVAILABLE), how many
attempts, and
exponential backoff — no retry loops in your code. Keepalive pings detect
dead connections
and keep idle ones alive through NATs/proxies. A standard health-checking service
lets load
balancers and Kubernetes probes ask "are you ready?" Together these are the resilience baseline.
{
"methodConfig": [{
"name": [{ "service": "user.v1.UserService" }],
"retryPolicy": {
"maxAttempts": 4,
"initialBackoff": "0.1s",
"maxBackoff": "2s",
"backoffMultiplier": 2,
"retryableStatusCodes": [ "UNAVAILABLE" ]
}
}]
}
// Attach to the channel (Go: grpc.WithDefaultServiceConfig(json);
// Python: grpc.insecure_channel(target, options=[("grpc.service_config", json)]))
// Only retry IDEMPOTENT methods automatically — retrying a non-idempotent
// "charge card" can double-charge (same lesson as POST idempotency keys in REST).
Automatic retries are safe only for idempotent methods. Retrying a "create payment" on a timeout can charge twice — the exact danger the REST manual solved with idempotency keys. Mark which methods are safe, and for the rest, use an idempotency key or accept that they aren't auto-retried.
gRPC vs REST — Choosing
Not a rivalry — different tools. The honest decision rule: gRPC for internal, high-throughput, typed service-to-service traffic; REST/JSON for public, browser-facing, human-debuggable, cache-friendly APIs. Most real systems run both: gRPC between backend services, a REST (or GraphQL) edge for the outside world.
| Dimension | gRPC | REST / JSON |
|---|---|---|
| Payload | Binary protobuf — compact, fast | Text JSON — bulky, human-readable |
| Contract | Enforced by .proto; codegen |
Convention + docs (OpenAPI optional) |
| Transport | HTTP/2 only (multiplexed) | Any HTTP, incl. 1.1 |
| Streaming | First-class, bidirectional | Bolt-ons (SSE, polling, WebSockets) |
| Browser support | No native — needs gRPC-Web (§19) | Universal |
| Human-debuggable | Needs tooling (grpcurl) | curl, browser, devtools |
| HTTP caching | Not really | Mature (ETag, Cache-Control) |
| Best fit | Microservices, internal, low-latency, polyglot | Public APIs, web/mobile front doors, third parties |
A very common shape: clients hit a REST/JSON gateway over HTTP/1.1; behind it, that gateway and all internal services speak gRPC to each other. You get the public-friendliness of REST at the edge and the speed/typing of gRPC in the core — and tools like grpc-gateway (§19) can even generate the REST edge from the same proto.
gRPC-Web & Browser Interop
A hard constraint, and a frequent surprise: browsers cannot speak native gRPC. The
reason is
from §06 — gRPC needs fine-grained control over HTTP/2 frames and trailers, and browser
fetch/XHR don't expose that. So talking to gRPC from a web frontend requires a
translation layer.
Your options, roughly in order of how much you want gRPC on the frontend:
- gRPC-Web — a variant protocol plus a generated JS/TS client; a proxy (Envoy has a built-in filter) translates it to real gRPC. Streaming is limited (server-streaming works; full bidirectional generally doesn't).
- Connect (connectrpc) — a modern protocol family that speaks gRPC, gRPC-Web, and its own HTTP/JSON, often without a separate proxy, from the same handlers. Increasingly the friendliest path.
- grpc-gateway / transcoding — generate a REST+JSON facade from
your
.proto(via HTTP annotations). The browser uses plain REST; the gateway transcodes to gRPC. This is the “REST edge from one proto” pattern referenced in §18.
If a browser must call your service directly, decide the bridge strategy up front — you
can't just point
fetch at a gRPC port. For internal-only services this never matters; for anything
web-facing it's
a first-class design decision.
Observability & Debugging
Binary payloads mean you can't eyeball traffic like JSON (§05), so gRPC ships an ecosystem to see inside. Knowing these turns a gRPC service from a black box into something you can poke, trace, and probe.
| Tool / feature | What it gives you |
|---|---|
grpcurl |
The curl of gRPC — call methods from the CLI with JSON in/out,
list services and methods. |
| Server reflection | Lets clients/tools discover a server's services & message schemas at runtime
— so grpcurl works without the .proto on hand. |
| Health checking | A standard grpc.health.v1.Health service for readiness/liveness probes
(K8s, load balancers). |
| channelz | Built-in introspection of live channels, connections, and per-RPC stats for debugging connectivity. |
| Interceptors (§15) | The hook for metrics (Prometheus), structured logs, and distributed tracing (OpenTelemetry) on every call. |
| Trace propagation | Pass a trace/request ID through metadata (§13) so one request is followable across every service it touches. |
# List every service the server exposes (needs reflection enabled):
grpcurl localhost:50051 list
# List the methods of one service:
grpcurl localhost:50051 list user.v1.UserService
# Call a unary method with a JSON request — grpcurl turns it into protobuf for you:
grpcurl -d '{"id": "u42"}' localhost:50051 user.v1.UserService/GetUser
# Against a TLS server, drop -plaintext; for local insecure servers, add it:
grpcurl -plaintext -d '{"id":"u42"}' localhost:50051 user.v1.UserService/GetUser
Turn on server reflection in dev/staging so grpcurl and GUI tools (like Postman's
gRPC mode or
grpcui) can explore your API without you shipping .proto files around. Many teams
disable it in
production to avoid advertising the schema — a small attack-surface decision.
Debug Cheat-Sheet
The whole manual compressed to what you reach for under pressure.
| Concept | One-liner |
|---|---|
| gRPC | Typed remote method calls — Protobuf contract, binary serialization, HTTP/2 transport. |
| Protobuf (IDL) | The .proto is the contract; both sides generate code from it. |
| Field numbers | The real wire identity — never change or reuse one; adding fields is safe. |
| Wire format | tag = (field«3)|wiretype, varint-packed; small + fast, not
human-readable. |
| HTTP/2 | One connection, many interleaved streams; one RPC = one stream. |
| Unary | 1→1 — the normal call (~90% of RPCs). |
| Server stream | 1→many — feeds, big result sets. |
| Client stream | many→1 — uploads, batch aggregation. |
| Bidi stream | many↔many — chat, realtime, full-duplex. |
| Channel vs stub | Channel = reused connection (one, shared); stub = cheap per-call caller. |
| Metadata | Key–value side-channel = HTTP headers; carries tokens & trace IDs. |
| Deadline | Absolute time bound, propagates across hops — set one on every call. |
| Status codes | gRPC's own enum (OK, NOT_FOUND, UNAVAILABLE…), in a trailer — not HTTP codes. |
| Interceptors | gRPC middleware — auth, logging, tracing, recovery; unary & stream. |
| Security | TLS/mTLS for the pipe + token-in-metadata for the user; never insecure
in prod. |
| Load balancing | Use L7 / client-side — an L4 TCP balancer pins everything to one backend. |
| Retries | Declarative via service config; only auto-retry idempotent methods. |
| Browser | No native gRPC — use gRPC-Web, Connect, or a REST gateway. |
| Debug | grpcurl + reflection; channelz for connections; health service for
probes. |
| vs REST | gRPC = internal/typed/fast; REST = public/debuggable/cacheable. Run both. |
The whole topic in one breath: gRPC lets a client call a server's method as if
it were
local. You define the API once in a .proto (§04), protoc
generates a typed
stub and server skeleton (§11), arguments are serialized as compact protobuf (§05) and
carried over
multiplexed HTTP/2 streams (§06) in one of four shapes — unary, server-, client-, or
bidirectional-streaming (§07–10). A long-lived channel hosts cheap per-call stubs
(§12);
metadata carries auth and trace context, deadlines bound and propagate the call, cancellation
reclaims work
(§13); every call ends with a gRPC status code (§14); interceptors centralize
cross-cutting logic
(§15); TLS/mTLS plus per-call tokens secure it (§16); declarative retries and
L7/client-side
balancing make it resilient (§17). Reach for gRPC inside your system and REST at the public
edge
(§18–19), and lean on grpcurl, reflection and interceptors to see inside (§20).
Grounded in grpc.io & protobuf.dev docs · MDN for HTTP/2 background · Go 1.22+ (google.golang.org/grpc) / Python 3.11+ (grpcio) examples.