Module 10 / 20 · Phase C — Scale & Reliability · 40 min

Load balancing,
introduced.

You have ten servers. Each client only knows about one address. Something has to bridge those two facts — and decide, ten thousand times a second, which server handles each request. That something is a load balancer, and it's the quiet hero of every modern system.

// What you'll know by the end

  • What a load balancer is and where it sits
  • The four common routing algorithms
  • How health checks keep traffic away from dead servers
  • Why the LB itself shouldn't be a single point of failure
§ 01 — The missing piece

You added more boxes.
Now how does the traffic find them?

Module 09 ended with you scaling horizontally: instead of one giant server, you've got 25 modest ones, all running the same code. Excellent. But there's a problem you haven't solved. The client only knows one thing: api.systemdesigntutorial.com. That's a single hostname pointing — historically — to a single server. With 25 servers, what does the address point to now? All of them? None of them? Whichever one's least busy?

// THE PROBLEM IN ONE PICTURE
CLIENTS 10K/sec ? missing piece S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 "Which one of those servers gets this request?"

The honest answer is: the client neither knows nor cares. Whoever owns api.systemdesigntutorial.com needs to put something in the middle that does the deciding — a small piece of software whose entire job is to receive incoming requests and spray them across the pool of servers behind it. Without this piece, horizontal scaling doesn't actually work. That's the load balancer.

§ 02 — What it does, exactly

One door
to many rooms.

A load balancer is a piece of software (or a piece of hardware, or — most commonly today — a managed cloud service) that sits between clients and a pool of servers. From the outside, it presents itself as a single address. From the inside, it knows about all your servers and forwards each request to one of them, according to some algorithm.

// WHERE IT SITS · THE CANONICAL PICTURE

CLIENTS browsers, apps api.systemdesigntutorial.com LOAD BALANCER picks a server forwards request ▤ Server 1 ▤ Server 2 ▤ Server 3 ▤ Server N health checks SERVER POOL · same code everywhere

That single picture is most of what you need. Three things the LB does, every single request, ten thousand times a second:

Outside: one address. Inside: many servers. The LB hides one from the other.

And there's a fourth thing — running quietly in the background — that turns the LB from a switchboard into something actually useful: health checks. Every few seconds, the LB pings each server with GET /health. If a server stops responding, the LB removes it from the pool and stops sending traffic. Suddenly, "server crashed" becomes a non-event instead of an outage. We'll come back to that in §05.

§ 03 — Four routing algorithms · interactive

How does it
actually decide?

Every load balancer comes with a menu of algorithms. The picks share the same goal — spread the work evenly — but they go about it differently, and the differences matter when servers are uneven, slow, or dying. Click any card to see how each works under the hood.

// THE CLASSIC

Round Robin

// "next, please"

The simplest algorithm. Keep a counter. For each new request, send it to server[counter % N], then increment. Cycle through forever. Stupid simple, surprisingly effective when servers are uniform.

Strengths: Trivial to implement. Perfectly even distribution over the long run. No state to keep besides the counter.

Weaknesses: Blind to server health and current load. If one server is slow, it still gets its 1/N share — and the requests pile up there while other servers idle. Doesn't react to anything.

Pick when: Servers are roughly identical and requests are similar in cost.
click to expand
// THE SMART ONE

Least Connections

// "send it to whoever's least busy"

Track the number of active connections on each server. New request? Send it to whichever server has the fewest in-flight. Naturally avoids slow or struggling servers because their connection count piles up.

Strengths: Adapts to real-time load. Slow servers stop receiving traffic automatically — their connection count stays high, so the LB stops picking them. Handles mixed-cost requests well.

Weaknesses: Slightly more state to maintain. Can be fooled by very fast requests (none ever pile up) or by background tasks (connections stay open).

Pick when: Requests vary in cost, or your servers are heterogeneous.
click to expand
// THE STICKY ONE

IP Hash

// "same client → same server"

Hash the client's IP address; pick a server based on the result. Same client always hits the same server. Useful when you need session affinity — a cart sitting in memory on one specific server.

Strengths: Stateful in-memory sessions actually work. Each client gets a "home" server.

Weaknesses: Uneven distribution when client IPs cluster (e.g., many users behind one corporate NAT). Adding or removing servers re-shuffles many clients. Mostly a sign you should move state out of app memory anyway.

Pick when: You can't easily make the app stateless. Use sparingly — it's a smell.
click to expand
// THE WEIGHTED ONE

Weighted Round Robin

// "some servers are bigger"

Round robin, but each server has a weight (its relative capacity). A server weighted 3 gets three turns for every one that S1 gets. Useful when your fleet isn't homogeneous — say, mixing 4-vCPU and 16-vCPU instances.

Strengths: Lets you scale heterogeneous fleets predictably. Useful during gradual rollouts ("send only 5% of traffic to the new version") or when retiring old hardware.

Weaknesses: Weights are manually configured — out of date if instance types change. Doesn't react to real-time load.

Pick when: Your pool isn't uniform, or for canary deployments.
click to expand

In practice, most modern load balancers default to Least Connections or a sophisticated variant of it (like "Power of Two Choices" — pick two random servers, send to whichever has fewer connections, mathematically near-optimal with very little state). Round robin is the textbook example, but production setups want something that reacts to actual load.

§ 04 — Load balancer simulator · interactive lab

Now play with one
and watch it misbehave.

Below: a live load balancer with 5 servers behind it. Pick an algorithm and hit Start. Watch how requests fan out. Then use the toggles to kill a server or make one slow — and see which algorithms adapt, which ones don't.

LOAD_BALANCER.SIM // m.10 lab
ALGORITHM:
CLIENTS stream LB round-robin 0 press Start to begin streaming
// Server pool
// Disrupt a server
// WHAT'S HAPPENING
Round Robin · all servers healthy

Requests cycle through servers 1 → 2 → 3 → 4 → 5 → 1 → … Perfectly even distribution. Now try the disrupt buttons — kill a server or make one slow, and watch which algorithms adapt. Round Robin won't notice. Least Connections will.

§ 05 — Health checks & the LB's own death

Two problems
worth solving early.

The simulator made one thing visible: when a server dies, the LB needs to stop sending traffic to it. The mechanism behind that is mundane but essential — and the other thing nobody mentions about LBs is that they themselves are a single point of failure unless you specifically design around it.

Health checks

// how the LB knows who's alive

Every few seconds, the LB sends a small request to each server — typically GET /health. If the server returns 200 OK, it stays in the pool. If it returns an error, times out, or doesn't respond at all, the LB marks it unhealthy and stops routing to it.

Apps should expose a real /health endpoint that returns 200 only when they can actually serve traffic — it's checked the database, it has memory free, it isn't shutting down. A bad health check is worse than none: a server that says "healthy" while broken means the LB keeps sending requests into the void.

Modern LBs check every 5–10 seconds with a few seconds of failure required before marking a server down — to avoid flapping on brief blips.

The LB itself is a SPOF

// who watches the watchman?

You put a load balancer in front of N servers to remove the single-point-of-failure problem. But now the load balancer is the single point of failure. If the LB dies, every server behind it becomes unreachable. Congratulations: you've moved the SPOF, not removed it.

The fix: run multiple load balancers. Two or more LBs, each capable of forwarding to the same pool. Then use DNS-level routing (or anycast IP, or VIP failover) so that the hostname resolves to whichever LB is currently alive. Managed cloud LBs (AWS ALB, Cloudflare, Google Cloud LB) handle this for you transparently — they run as fleets across availability zones.

The general rule: if removing one thing takes down your whole site, that thing needs a buddy. Apply recursively.

Put together, these two ideas — health checks below the LB, redundancy above it — turn a single load balancer into a self-healing layer. A server crashes? The LB stops using it within seconds. An LB crashes? DNS shifts traffic to the surviving one. The whole system absorbs failures that, on a single-server setup, would have been outages. That's what "highly available" really means: nothing is special enough that its death breaks the system.

§ 06 — Eight words for the routing layer

Vocabulary,
for the switchboard.

You'll meet these in every architecture diagram from here on. Get them comfortable.

Load Balancer
/loʊd ˈbælənsər/
A component that distributes incoming requests across a pool of servers. The entry point for any horizontally-scaled system.
Upstream
/ˈʌpstriːm/
The servers behind the load balancer — the things receiving forwarded traffic. NGINX calls them upstreams; cloud LBs call them targets or backends.
Health Check
/hɛlθ tʃɛk/
A periodic probe the LB sends to each upstream (usually GET /health) to confirm it's alive. Failed probes remove the server from the rotation.
Session Affinity
/əˈfɪnɪti/
Also "sticky sessions." When the LB pins a client to a specific server for the duration of their session. Needed when state lives in app memory; usually a sign to refactor that.
Layer 4 / Layer 7
/leɪər fɔːr/
L4 LBs route based on TCP (just IPs and ports — fast, dumb). L7 LBs read HTTP (can route by URL path, headers, cookies — slower, smarter). Most modern LBs are L7.
Round Robin
/raʊnd ˈrɒbɪn/
The simplest routing algorithm: cycle through servers in order. Even distribution; blind to load. The default in many configurations.
Reverse Proxy
/rɪˈvɜːs ˈprɒksi/
A server that sits between clients and backends, forwarding requests. Every load balancer is a reverse proxy. NGINX and HAProxy are common examples.
VIP
/viː aɪ piː/
"Virtual IP." A single IP address that can be served by multiple physical machines (failover, anycast). The mechanism behind highly-available load balancers.
§ 07 — Knowledge check

Five questions.
Route the traffic.

Test the intuition. Pick an answer; the explanation drops in instantly.

QUESTION 1 OF 5
Loading question...
Score: 0 / 5
5 / 5

Routed.

You see what the LB is doing now. Onward to DNS — the layer in front of the load balancer.

§ 08 — The recap

Three ideas to
carry forward.

The LB is the most invisible part of modern systems — and the one whose absence breaks everything.

i

The LB is one address, many rooms

Clients hit one hostname. Behind it, the LB picks a server per request and forwards. The cluster behind the curtain is invisible.

ii

Algorithms matter under stress

Round robin is fine when everything's healthy. Least Connections adapts when servers slow down. IP hash sacrifices flexibility for affinity.

iii

Health checks + redundancy = highly available

Below the LB: probe each server, drop the dead ones. Above the LB: run multiple LBs with DNS or VIP failover. Then nothing is special.

↓ UP NEXT

M.11 — DNS,
the internet's address book.

The client typed api.systemdesigntutorial.com. Something has to turn that string into an IP address before the request can even leave the machine. Behind that translation is one of the largest distributed databases ever built — and it gets queried billions of times per second. Time to look inside.

Continue to Module 11 →