Module 08 / 20 · Phase B — Data & Storage · finale · 40 min

Latency, throughput
& the speed of light.

Two numbers, often confused. One you can engineer around. The other is set by physics. Once you can tell them apart — and know where each one breaks — every performance conversation gets sharper.

// What you'll know by the end

  • Latency vs throughput, finally distinguished
  • Where the milliseconds actually come from
  • Why London → Tokyo will always be ~210ms
  • The latency numbers every engineer should know
§ 01 — Two words, often confused

"Make it faster"
is two questions.

Walk into any performance review and someone will say "the system needs to be faster." That sentence hides two completely different questions. One asks how long does a single request take? The other asks how many requests per second can we handle? Those are different problems with different solutions, and confusing them is a great way to optimize the wrong thing.

// TWO PROBLEMS THAT BOTH SOUND LIKE "FASTER"
// LATENCY
How long?
"My single request takes 800ms — can we get it under 100?"
Duration of one operation, end to end. Measured in milliseconds per request. Lower is better. What the user feels when they click.
Fix with: indexes, caches, geographic proximity, fewer round trips.
// THROUGHPUT
How many?
"We can only handle 200 requests/second — can we get to 2,000?"
Rate of completed work. Measured in requests per second (or per minute). Higher is better. What capacity planning is about.
Fix with: more servers, async processing, batching, load balancing.

The hidden trap: improving one doesn't always improve the other. Adding more servers boosts throughput but does nothing for latency. Caching results reduces latency but may not move throughput much if the underlying database wasn't the bottleneck. "Faster" is two questions. Always know which one you're answering.

§ 02 — The pipe metaphor

A pipe has
two dimensions.

Picture a water pipe. Its length determines how long it takes for water to travel from one end to the other — that's latency. Its diameter determines how much water can pass through per second — that's throughput. You can have a long fat pipe (high throughput, high latency) or a short thin one (low throughput, low latency). They're independent properties of the same connection.

// FOUR PIPES · SAME DRAWING, DIFFERENT MEANINGS

SHORT THIN low latency · low throughput A B → same DC, slow link · fast trip, narrow band SHORT FAT low latency · high throughput A B → ideal · same DC, fat link · the dream LONG THIN high latency · low throughput A B → mobile in remote area LONG FAT high latency · high throughput A B → transoceanic fiber

The transoceanic fiber pipe is fascinating: it can move enormous data per second (high throughput), but a single request still pays the round-trip cost (high latency). When you upload a 1GB video to a server in Sydney, the bandwidth determines how long the bulk transfer takes. But when you make a quick API call, the round trip is what you feel — bandwidth barely matters.

Latency: how long for one. Throughput: how many per second. Independent.

This is why "make it faster" needs the follow-up: for one request, or for total traffic? Caching makes a request faster. More servers handle more requests. The two problems have different fixes — and applying the wrong one wastes time and money.

§ 03 — Where milliseconds come from

Every millisecond
has an origin.

When you make a network request, the total latency is the sum of four distinct delays. Optimizing latency means knowing which of the four is dominating — because the fixes for each are different.

Source
What's happening
Typical
Propagation// physics
The time for the signal itself to travel through the medium. In fiber, light moves at about 200,000 km/s — roughly 2/3 the speed of light in vacuum. This is the floor — you cannot beat it.
~5ms / 1000km
Transmission// bandwidth
The time to push the bytes onto the wire. A 1KB packet on a 1Gbps link takes ~10μs. Large payloads make this larger. Wider pipe = less transmission delay.
μs to ms
Processing// CPU
Routers and servers reading, deciding, and forwarding packets. Each hop adds a tiny bit. Server-side application logic adds more. Cheaper hardware = more delay here.
μs per hop
Queueing// congestion
When too many packets arrive at a router or server, they wait in line. Under load, this dominates everything else. The most variable component — it's what makes "p99 latency" spiky.
0 to ∞

For a request that crosses the planet, propagation dominates — you can't make light go faster. For a request inside the same data center, processing and queueing dominate — propagation is microseconds. For uploads of huge files, transmission dominates — you need bandwidth. Knowing which is your bottleneck tells you which lever to pull. Tuning the wrong one wastes weeks.

§ 04 — Speed of light race · interactive lab

Now race a packet
against physics.

Pick a city pair and hit Send packet. You'll see the actual route an HTTPS request travels, alongside the theoretical minimum dictated by the speed of light through fiber. The gap is engineering overhead — every router, every TLS handshake, every queue. But the floor is set by physics, not by Cloudflare.

SPEED_OF_LIGHT.SIM // m.08 lab
EQUATOR PICK A CITY PAIR · PRESS SEND ready
// Route details
Distance
km
Light-speed floor (RTT)
ms
Real-world typical (RTT)
ms
Engineering overhead
ms
// WHAT THE NUMBERS SAY
Press Send to begin

Each route has a different gap between physics and reality. Short-distance routes (same continent) are dominated by routing and processing — engineering matters a lot. Long-distance routes are dominated by propagation — even Google can't beat the speed of light.

§ 05 — Numbers every engineer should know

The famous
Jeff Dean numbers.

In a now-legendary talk, Google's Jeff Dean shared a list of operation latencies that became required reading for backend engineers. The numbers below are updated for modern hardware. Memorize the order of magnitudes — you'll use them in nearly every design decision.

// LATENCY TIERS · ACTUAL TIME AND IF 1ns = 1s

Operation
Actual time
If 1ns = 1 second
L1 cache reference
~0.5 ns
0.5 sec
Branch mispredict
~5 ns
5 sec
L2 cache reference
~7 ns
7 sec
Mutex lock / unlock
~25 ns
25 sec
Main memory reference
~100 ns
~2 minutes
Compress 1KB w/ Snappy
~3 μs
~50 minutes
Send 1KB over 1Gbps network
~10 μs
~3 hours
SSD random read
~16 μs
~4.5 hours
Read 1MB sequentially from memory
~250 μs
~3 days
Round trip within same data center
~500 μs
~6 days
Read 1MB sequentially from SSD
~1 ms
~12 days
Read 1MB from spinning disk
~20 ms
~8 months
Packet US → Europe → US (round trip)
~150 ms
~5 years

The reason these numbers matter is that they reveal the order of magnitude of every choice you make. A function call that touches memory 1000 times is fast. A function call that hits the SSD 1000 times is slow. One that makes 1000 network calls is broken. Same code, completely different consequences.

And one last observation, because it's the heart of this whole module: each tier is roughly 100× slower than the previous one. RAM is 100× slower than CPU cache. SSD is 100× slower than RAM. Network is 100× slower than SSD. Cross-region network is 100× slower again. Designing systems is largely about deciding which tier each operation belongs in — and minimizing trips down the ladder.

§ 06 — Eight words for performance

Vocabulary,
for the limit.

These are the words you'll use when investigating any slow system. Get fluent.

Latency
/ˈleɪtənsi/
Duration of a single operation from start to finish. Measured in ms. Lower is better. What the user feels.
Throughput
/ˈθruːpʊt/
Rate of completed work. Measured in req/s or MB/s. Higher is better. What capacity planning is about.
Round-Trip Time
/raʊnd trɪp taɪm/
Time for a message to go from A to B and a reply to come back. Often abbreviated RTT. Doubles the propagation delay.
Bandwidth
/ˈbændwɪdθ/
The maximum rate of data transfer on a link. 1 Gbps = roughly 125MB/sec. Bandwidth is the width of the pipe; latency is the length.
p50 / p95 / p99
/piː fɪfti…/
Percentile latencies. p99 = "99% of requests finished faster than this." Tail latencies (p95, p99) reveal pain that p50 (median) hides.
Jitter
/ˈdʒɪtə/
The variation in latency over time. Average might be 50ms but jitter of ±200ms makes the system feel broken. Real-time systems hate jitter.
Propagation
/ˌprɒpəˈɡeɪʃən/
The physical travel time of a signal through a medium. In fiber: ~5ms per 1000km. The component you can't engineer away.
Bandwidth-Delay Product
/ˈbændwɪdθ dɪˈleɪ/
Bandwidth × round-trip time. Measures how much data is "in flight" at once. Used to size TCP buffers and understand throughput on long links.
§ 07 — Knowledge check

Five questions.
Mind the physics.

Locking in the intuition. Pick an answer; the explanation appears immediately.

QUESTION 1 OF 5
Loading question...
Score: 0 / 5
5 / 5

Pragmatist.

You can see latency and throughput as separate problems now. Phase B is locked in. Onward to scaling.

§ 08 — The recap

Three ideas to
carry forward.

Phase B done. These three ideas carry into everything about scale that follows.

i

"Faster" is two questions

Latency (per request) and throughput (per second) are different problems with different fixes. Always know which one you're optimizing.

ii

Physics sets the floor

Light is ~5ms per 1000km in fiber. London ↔ Tokyo will always be ~95ms minimum. No engineering reduces that — only proximity does.

iii

Each tier is ~100× slower

CPU → RAM → SSD → network → cross-region. Designing fast systems is about minimizing how many tiers down each operation drops.

PHASE B COMPLETE Data & Storage · 4 modules · Up next: Phase C — Scale & Reliability
↓ UP NEXT · PHASE C BEGINS

M.09 — Vertical vs Horizontal
scaling.

You know how data is stored and how fast it can travel. Now the question that defines real systems: what do you do when one server isn't enough? Two answers, two completely different futures. Phase C begins.

Continue to Module 09 →