Latency vs Throughput — p50, p99 Percentiles and Little's Law

§ 01 — Two words, often confused

"Make it faster"
is two questions.

Walk into any performance review and someone will say "the system needs to be faster." That sentence hides two completely different questions. One asks how long does a single request take? The other asks how many requests per second can we handle? Those are different problems with different solutions, and confusing them is a great way to optimize the wrong thing.

// TWO PROBLEMS THAT BOTH SOUND LIKE "FASTER"

// LATENCY

How long?

"My single request takes 800ms — can we get it under 100?"

Duration of one operation, end to end. Measured in milliseconds per request. Lower is better. What the user feels when they click.

Fix with: indexes, caches, geographic proximity, fewer round trips.

// THROUGHPUT

How many?

"We can only handle 200 requests/second — can we get to 2,000?"

Rate of completed work. Measured in requests per second (or per minute). Higher is better. What capacity planning is about.

Fix with: more servers, async processing, batching, load balancing.

The hidden trap: improving one doesn't always improve the other. Adding more servers boosts throughput but does nothing for latency. Caching results reduces latency but may not move throughput much if the underlying database wasn't the bottleneck. "Faster" is two questions. Always know which one you're answering.

§ 02 — The pipe metaphor

A pipe has
two dimensions.

Picture a water pipe. Its length determines how long it takes for water to travel from one end to the other — that's latency. Its diameter determines how much water can pass through per second — that's throughput. You can have a long fat pipe (high throughput, high latency) or a short thin one (low throughput, low latency). They're independent properties of the same connection.

// FOUR PIPES · SAME DRAWING, DIFFERENT MEANINGS

The transoceanic fiber pipe is fascinating: it can move enormous data per second (high throughput), but a single request still pays the round-trip cost (high latency). When you upload a 1GB video to a server in Sydney, the bandwidth determines how long the bulk transfer takes. But when you make a quick API call, the round trip is what you feel — bandwidth barely matters.

Latency: how long for one. Throughput: how many per second. Independent.

This is why "make it faster" needs the follow-up: for one request, or for total traffic? Caching makes a request faster. More servers handle more requests. The two problems have different fixes — and applying the wrong one wastes time and money.

§ 03 — Where milliseconds come from

Every millisecond
has an origin.

When you make a network request, the total latency is the sum of four distinct delays. Optimizing latency means knowing which of the four is dominating — because the fixes for each are different.

Propagation// physics

The time for the signal itself to travel through the medium. In fiber, light moves at about 200,000 km/s — roughly 2/3 the speed of light in vacuum. This is the floor — you cannot beat it.

~5ms / 1000km

Transmission// bandwidth

The time to push the bytes onto the wire. A 1KB packet on a 1Gbps link takes ~10μs. Large payloads make this larger. Wider pipe = less transmission delay.

μs to ms

Processing// CPU

Routers and servers reading, deciding, and forwarding packets. Each hop adds a tiny bit. Server-side application logic adds more. Cheaper hardware = more delay here.

μs per hop

Queueing// congestion

When too many packets arrive at a router or server, they wait in line. Under load, this dominates everything else. The most variable component — it's what makes "p99 latency" spiky.

0 to ∞

For a request that crosses the planet, propagation dominates — you can't make light go faster. For a request inside the same data center, processing and queueing dominate — propagation is microseconds. For uploads of huge files, transmission dominates — you need bandwidth. Knowing which is your bottleneck tells you which lever to pull. Tuning the wrong one wastes weeks.

§ 04 — Speed of light race · interactive lab

Now race a packet
against physics.

Pick a city pair and hit Send packet. You'll see the actual route an HTTPS request travels, alongside the theoretical minimum dictated by the speed of light through fiber. The gap is engineering overhead — every router, every TLS handshake, every queue. But the floor is set by physics, not by Cloudflare.

SPEED_OF_LIGHT.SIM // m.08 lab

// Route details

Distance

— km

Light-speed floor (RTT)

— ms

Real-world typical (RTT)

— ms

Engineering overhead

— ms

// WHAT THE NUMBERS SAY

Press Send to begin

Each route has a different gap between physics and reality. Short-distance routes (same continent) are dominated by routing and processing — engineering matters a lot. Long-distance routes are dominated by propagation — even Google can't beat the speed of light.

§ 05 — Numbers every engineer should know

The famous
Jeff Dean numbers.

In a now-legendary talk, Google's Jeff Dean shared a list of operation latencies that became required reading for backend engineers. The numbers below are updated for modern hardware. Memorize the order of magnitudes — you'll use them in nearly every design decision.

// LATENCY TIERS · ACTUAL TIME AND IF 1ns = 1s

L1 cache reference

~0.5 ns

0.5 sec

Branch mispredict

~5 ns

5 sec

L2 cache reference

~7 ns

7 sec

Mutex lock / unlock

~25 ns

25 sec

Main memory reference

~100 ns

~2 minutes

Compress 1KB w/ Snappy

~3 μs

~50 minutes

Send 1KB over 1Gbps network

~10 μs

~3 hours

SSD random read

~16 μs

~4.5 hours

Read 1MB sequentially from memory

~250 μs

~3 days

Round trip within same data center

~500 μs

~6 days

Read 1MB sequentially from SSD

~1 ms

~12 days

Read 1MB from spinning disk

~20 ms

~8 months

Packet US → Europe → US (round trip)

~150 ms

~5 years

The reason these numbers matter is that they reveal the order of magnitude of every choice you make. A function call that touches memory 1000 times is fast. A function call that hits the SSD 1000 times is slow. One that makes 1000 network calls is broken. Same code, completely different consequences.

And one last observation, because it's the heart of this whole module: each tier is roughly 100× slower than the previous one. RAM is 100× slower than CPU cache. SSD is 100× slower than RAM. Network is 100× slower than SSD. Cross-region network is 100× slower again. Designing systems is largely about deciding which tier each operation belongs in — and minimizing trips down the ladder.

§ 06 — Eight words for performance

Vocabulary,
for the limit.

These are the words you'll use when investigating any slow system. Get fluent.

Latency

/ˈleɪtənsi/

Duration of a single operation from start to finish. Measured in ms. Lower is better. What the user feels.

Throughput

/ˈθruːpʊt/

Rate of completed work. Measured in req/s or MB/s. Higher is better. What capacity planning is about.

Round-Trip Time

/raʊnd trɪp taɪm/

Time for a message to go from A to B and a reply to come back. Often abbreviated RTT. Doubles the propagation delay.

Bandwidth

/ˈbændwɪdθ/

The maximum rate of data transfer on a link. 1 Gbps = roughly 125MB/sec. Bandwidth is the width of the pipe; latency is the length.

p50 / p95 / p99

/piː fɪfti…/

Percentile latencies. p99 = "99% of requests finished faster than this." Tail latencies (p95, p99) reveal pain that p50 (median) hides.

Jitter

/ˈdʒɪtə/

The variation in latency over time. Average might be 50ms but jitter of ±200ms makes the system feel broken. Real-time systems hate jitter.

Propagation

/ˌprɒpəˈɡeɪʃən/

The physical travel time of a signal through a medium. In fiber: ~5ms per 1000km. The component you can't engineer away.

Bandwidth-Delay Product

/ˈbændwɪdθ dɪˈleɪ/

Bandwidth × round-trip time. Measures how much data is "in flight" at once. Used to size TCP buffers and understand throughput on long links.

§ 07 — Knowledge check

Five questions.
Mind the physics.

Locking in the intuition. Pick an answer; the explanation appears immediately.

QUESTION 1 OF 5

Loading question...

Score: 0 / 5

5 / 5

Pragmatist.

You can see latency and throughput as separate problems now. Phase B is locked in. Onward to scaling.

§ 08 — The recap

Three ideas to
carry forward.

Phase B done. These three ideas carry into everything about scale that follows.

i

"Faster" is two questions

Latency (per request) and throughput (per second) are different problems with different fixes. Always know which one you're optimizing.

ii

Physics sets the floor

Light is ~5ms per 1000km in fiber. London ↔ Tokyo will always be ~95ms minimum. No engineering reduces that — only proximity does.

iii

Each tier is ~100× slower

CPU → RAM → SSD → network → cross-region. Designing fast systems is about minimizing how many tiers down each operation drops.

Latency, throughput
& the speed of light.

// What you'll know by the end

"Make it faster"
is two questions.

How long?

How many?

A pipe has
two dimensions.

// FOUR PIPES · SAME DRAWING, DIFFERENT MEANINGS

Every millisecond
has an origin.

Now race a packet
against physics.

// Route details

Press Send to begin

The famous
Jeff Dean numbers.

// LATENCY TIERS · ACTUAL TIME AND IF 1ns = 1s

Vocabulary,
for the limit.

Five questions.
Mind the physics.

Pragmatist.

Three ideas to
carry forward.

"Faster" is two questions

Physics sets the floor

Each tier is ~100× slower

M.09 — Vertical vs Horizontal
scaling.

"Make it faster"is two questions.

How long?

How many?

A pipe hastwo dimensions.

// FOUR PIPES · SAME DRAWING, DIFFERENT MEANINGS

Every millisecondhas an origin.

Now race a packetagainst physics.

// Route details

Press Send to begin

The famousJeff Dean numbers.

// LATENCY TIERS · ACTUAL TIME AND IF 1ns = 1s

Vocabulary,for the limit.

Five questions.Mind the physics.

Pragmatist.

Three ideas tocarry forward.

"Faster" is two questions

Physics sets the floor

Each tier is ~100× slower

M.09 — Vertical vs Horizontalscaling.

"Make it faster"
is two questions.

A pipe has
two dimensions.

Every millisecond
has an origin.

Now race a packet
against physics.

The famous
Jeff Dean numbers.

Vocabulary,
for the limit.

Five questions.
Mind the physics.

Three ideas to
carry forward.

M.09 — Vertical vs Horizontal
scaling.