Two numbers, often confused. One you can engineer around. The other is set by physics. Once you can tell them apart — and know where each one breaks — every performance conversation gets sharper.
Walk into any performance review and someone will say "the system needs to be faster." That sentence hides two completely different questions. One asks how long does a single request take? The other asks how many requests per second can we handle? Those are different problems with different solutions, and confusing them is a great way to optimize the wrong thing.
The hidden trap: improving one doesn't always improve the other. Adding more servers boosts throughput but does nothing for latency. Caching results reduces latency but may not move throughput much if the underlying database wasn't the bottleneck. "Faster" is two questions. Always know which one you're answering.
Picture a water pipe. Its length determines how long it takes for water to travel from one end to the other — that's latency. Its diameter determines how much water can pass through per second — that's throughput. You can have a long fat pipe (high throughput, high latency) or a short thin one (low throughput, low latency). They're independent properties of the same connection.
The transoceanic fiber pipe is fascinating: it can move enormous data per second (high throughput), but a single request still pays the round-trip cost (high latency). When you upload a 1GB video to a server in Sydney, the bandwidth determines how long the bulk transfer takes. But when you make a quick API call, the round trip is what you feel — bandwidth barely matters.
This is why "make it faster" needs the follow-up: for one request, or for total traffic? Caching makes a request faster. More servers handle more requests. The two problems have different fixes — and applying the wrong one wastes time and money.
When you make a network request, the total latency is the sum of four distinct delays. Optimizing latency means knowing which of the four is dominating — because the fixes for each are different.
200,000 km/s — roughly 2/3 the speed of light in vacuum. This is the floor — you cannot beat it.For a request that crosses the planet, propagation dominates — you can't make light go faster. For a request inside the same data center, processing and queueing dominate — propagation is microseconds. For uploads of huge files, transmission dominates — you need bandwidth. Knowing which is your bottleneck tells you which lever to pull. Tuning the wrong one wastes weeks.
Pick a city pair and hit Send packet. You'll see the actual route an HTTPS request travels, alongside the theoretical minimum dictated by the speed of light through fiber. The gap is engineering overhead — every router, every TLS handshake, every queue. But the floor is set by physics, not by Cloudflare.
Each route has a different gap between physics and reality. Short-distance routes (same continent) are dominated by routing and processing — engineering matters a lot. Long-distance routes are dominated by propagation — even Google can't beat the speed of light.
In a now-legendary talk, Google's Jeff Dean shared a list of operation latencies that became required reading for backend engineers. The numbers below are updated for modern hardware. Memorize the order of magnitudes — you'll use them in nearly every design decision.
The reason these numbers matter is that they reveal the order of magnitude of every choice you make. A function call that touches memory 1000 times is fast. A function call that hits the SSD 1000 times is slow. One that makes 1000 network calls is broken. Same code, completely different consequences.
And one last observation, because it's the heart of this whole module: each tier is roughly 100× slower than the previous one. RAM is 100× slower than CPU cache. SSD is 100× slower than RAM. Network is 100× slower than SSD. Cross-region network is 100× slower again. Designing systems is largely about deciding which tier each operation belongs in — and minimizing trips down the ladder.
These are the words you'll use when investigating any slow system. Get fluent.
ms. Lower is better. What the user feels.req/s or MB/s. Higher is better. What capacity planning is about.RTT. Doubles the propagation delay.1 Gbps = roughly 125MB/sec. Bandwidth is the width of the pipe; latency is the length.p99 = "99% of requests finished faster than this." Tail latencies (p95, p99) reveal pain that p50 (median) hides.Locking in the intuition. Pick an answer; the explanation appears immediately.
You can see latency and throughput as separate problems now. Phase B is locked in. Onward to scaling.
Phase B done. These three ideas carry into everything about scale that follows.
Latency (per request) and throughput (per second) are different problems with different fixes. Always know which one you're optimizing.
Light is ~5ms per 1000km in fiber. London ↔ Tokyo will always be ~95ms minimum. No engineering reduces that — only proximity does.
CPU → RAM → SSD → network → cross-region. Designing fast systems is about minimizing how many tiers down each operation drops.