Module 07 / 20 · Phase B — Data & Storage · 42 min

Caching, and
the art of forgetting.

Indexes make a database fast. Caches make it invisible. The trick: hold onto recent answers so you never have to ask again. The cost: deciding which answers to keep, which to drop, and when to admit yours might be wrong.

// What you'll know by the end

  • Why memory is 100,000× faster than disk
  • The four layers of caching every app touches
  • How hit rate shapes everything else
  • The two genuinely hard problems
§ 01 — A speed gap, in human terms

If RAM is fast,
disk is millennia away.

Computers have a layered memory system. CPU registers are the fastest, RAM is slower, SSDs are slower still, and reaching across the network is comically slow by comparison. The differences aren't 2× or 10× — they're hundreds of thousands of times. Caching exists because of this gap. Every cache is just an attempt to move data up one tier so the next request gets a faster answer.

// ROUND-TRIP TIMES · SCALED FOR HUMAN INTUITION
TIER
RELATIVE SPEED
REAL TIME
IF 1ns = 1 SEC
CPU register
~0.3ns
~0.3 seconds
L1 cache
~1ns
~1 second
L2 cache
~5ns
~5 seconds
RAM
~100ns
~2 minutes
SSD read
~100μs
~28 hours
Network · same DC
~500μs
~6 days
HDD seek
~10ms
~4 months
Network · cross-region
~80ms
~2.5 years

Read that last line again. If a single nanosecond were a second, a single cross-region network call would take two and a half years. A RAM access would take two minutes. This is why a cache hit feels instant and a cache miss feels slow. When you "add a cache," you're moving an answer from the four-month tier to the two-minute tier. The performance gain is not subtle.

§ 02 — What a cache actually is

A fast layer
in front of a slow one.

Strip away the jargon and a cache is a very simple idea: a small, fast storage layer that sits between a requester and a slow store, holding recent answers. When the same question comes in twice, the second one gets answered from the fast layer without bothering the slow one. The pattern is identical at every level of computing — CPU caches, browser caches, CDN edges, application caches, database query caches. Same shape, different distances.

// THE TWO PATHS A REQUEST CAN TAKE

CLIENT asks a question CACHE fast, small in-memory DATABASE slow, big on disk 1. request 2. ✓ HIT · returned · ~1ms 3. ✗ MISS · ask DB 4. answer (slow) · ~80ms on a hit: request never reaches the database on a miss: cache fetches from DB, stores it, then replies next time someone asks the same thing → hit
Hit: fast and free. Miss: slow, plus the cost of caching it for next time.

The math that makes caching worth it is simple. If 90% of your requests are hits, your average response time is (0.9 × 1ms) + (0.1 × 80ms) ≈ 9ms. Without the cache, every request paid the full 80ms. That's a 9× speedup just by remembering. The whole career of cache engineering is in two questions: "can I push hit rate higher?" and "when do I throw stale data away?"

§ 03 — The four layers · interactive

A request passes
through four caches.

By the time a single page lands in your browser, a typical request has already been intercepted by four separate caches. Each one tries to answer before passing the request deeper. Click each card to see where it lives and what it holds.

// LAYER 1 — closest to user

Browser cache

// in the user's machine

Your browser stores static assets (CSS, JS, images) it has fetched before. If Cache-Control says they're still fresh, it serves them without even hitting the network.

How it gets controlled: The server sends Cache-Control: max-age=86400 with a response, and the browser keeps it for a day. ETag headers let the browser cheaply check if a cached asset is still valid without re-downloading.

What it saves: Network round trips for unchanged assets. The reason pages load in a flash when you visit them a second time.
click to expand
// LAYER 2 — geographically distributed

CDN edge cache

// data centers worldwide

Content delivery networks (Cloudflare, Fastly, Akamai) keep copies of your static assets in hundreds of edge locations. Users hit the nearest copy — often within their own city.

How it works: The first time someone in Tokyo requests logo.png, the CDN fetches it from your origin server, caches it at the Tokyo edge, and serves it. Every Tokyo user after that gets it locally — under 10ms instead of 200ms across the ocean.

What it saves: Bandwidth on your origin, plus latency for global users. Worth deeper coverage in Module 12.
click to expand
// LAYER 3 — inside your app

Application cache

// usually Redis or Memcached

A separate fast store your application checks before hitting the database. Holds whatever you choose — query results, session data, computed values, rate-limit counters.

How it works: Your code does cache.get("user:42") first. If hit, return it. If miss, query the DB, store in cache with a TTL, return. This is the cache-aside pattern — the most common in production.

What it saves: Database load on your hottest data. A well-tuned Redis layer often deflects 95%+ of read traffic from the primary DB.
click to expand
// LAYER 4 — inside the database

Database buffer pool

// RAM inside Postgres / MySQL

Even when you "go to the database," you usually don't go to the disk. The database keeps recently-read pages in RAM. Most queries are answered from this buffer without disk I/O.

How it works: Postgres calls it the "shared buffer." MySQL's InnoDB calls it the "buffer pool." Either way, it's RAM holding recently-touched data and index pages. The reason a "cold start" is slow but the same query a second later is fast.

What it saves: Disk I/O — which is the slowest thing a database does. Tuning buffer pool size is one of the biggest performance levers a DBA has.
click to expand

Notice the pattern: each layer holds less data but answers faster than the one beneath it. A request that misses Layer 1 tries Layer 2, then Layer 3, then Layer 4, then finally falls to disk. The deeper you go, the slower each step is. A well-designed system makes sure the popular data lives high up where the answers are fast.

§ 04 — Cache simulator · interactive lab

Watch a cache
earn its keep.

Below: a live cache sitting between a client and a slow database. Pick a workload pattern, a cache size, and an eviction policy. Press Start and watch requests stream through. The hit rate climbs as the cache warms up — and the access pattern decides whether it climbs to 30% or 95%.

CACHE_SIMULATOR.SIM // m.07 lab
PATTERN:
SIZE:
POLICY:
CLIENT user requests CACHE ~1ms · in RAM empty DATABASE ~80ms · slow press Start to begin streaming requests pattern: hot keys · 80% of requests hit ~20% of keys
// Live stats
Hit rate
Avg latency
Hits
0
Misses
0
// Cache contents · LRU order
cache is empty
// VERDICT
Pick settings and start the stream

The three workloads behave very differently. Hot keys (Zipfian) is what real internet traffic looks like — a few popular keys dominate, and caching wins big. Uniform spreads requests evenly across a huge keyspace; small caches barely help. Scan reads every key once, defeating any LRU cache. Cache effectiveness is mostly about matching policy to pattern.

§ 05 — The two genuinely hard problems

There are only
two hard things.

Phil Karlton said it best: "There are only two hard things in computer science: cache invalidation and naming things." The joke endures because cache invalidation really is brutal. You decide to cache an answer; the underlying data changes; now your cache is wrong. Worse — you don't know it's wrong. You return the stale answer to a user. They see something out of date. Multiply across users and now you have an incident.

Problem 1 · Staleness

// when the cached answer is wrong

You cache a user's email. They update it in the settings page. The next 60 requests still see the old email — because the cache hasn't been invalidated. This is the canonical hard problem.

The trade-off: caching longer = better performance, but more chances to serve stale data. Caching shorter = fresher data, but more database load. Every system has to pick a point on this curve, often per kind of data.

Problem 2 · The stampede

// when many misses arrive at once

A popular cache key expires. Suddenly 1000 requests hit, all see a miss, all hit the database simultaneously. Your DB — which the cache was protecting — gets crushed under sudden load. The cache failure cascades.

Mitigations: stagger TTLs with jitter, use a lock so only one request refills the cache while others wait, or serve a slightly-stale value while refreshing in the background. The good news: it's a known problem with known fixes.

// FOUR STRATEGIES FOR INVALIDATING A CACHE

TTL / tee tee ell /
Set each entry to expire after some time (EXPIRE 600). Simple, automatic, but you accept up to N seconds of staleness. Best for data that's "kinda fresh" — feed contents, public stats, search results.
Write-through / rīt-throo /
Update the cache at the moment you update the database. Same code path writes both. Cache always agrees with DB. Costs an extra write per change; best for hot data where correctness matters.
Write-behind / rīt-bīnd /
Writes go to the cache first; the cache flushes them to the DB later in the background. Wickedly fast for the writer, but if the cache crashes between write and flush, you lose data. Use cautiously.
Manual invalidation / man-yoo-əl /
Your code explicitly calls cache.delete("user:42") when user 42 changes. Most accurate, most labor — easy to forget a code path and end up with stale entries. Often combined with a backup TTL.

The honest meta-strategy: cache the things that change rarely or matter less if they're slightly stale. Product descriptions, search results, leaderboards, user profiles — easy wins. Account balances, permission checks, payment status — be very careful. The art is in picking which side of that line each piece of data belongs on.

§ 06 — Eight words for the cache layer

Vocabulary,
for the in-memory life.

These show up in every incident review and architecture meeting. Get fluent.

Hit / Miss
/hɪt · mɪs/
A hit is when the cache had what was asked for. A miss is when it didn't and had to fetch from the slow source. Hit rate is the percentage of requests that hit.
TTL
/tee tee ell/
"Time to live." How long a cache entry is considered fresh before it's automatically thrown away. SET key value EX 300 = expires in 5 minutes.
LRU
/el ar yoo/
"Least Recently Used." An eviction policy: when the cache is full, throw out the entry that wasn't touched the longest. The default in most caches because it matches typical access patterns.
Invalidation
/ɪnˌvælɪˈdeɪʃən/
Marking a cached entry as no-longer-valid so the next request goes to the source. The hard problem — because when to invalidate is rarely obvious.
Stampede
/stæmˈpiːd/
When a popular key expires and many requests miss simultaneously, all hitting the slow source at once. Also called "thundering herd." Mitigated with jittered TTLs and locks.
Cache-aside
/kæʃ əˈsaɪd/
The most common pattern: app checks cache, falls back to DB on miss, then populates cache. App code, not the cache, decides what gets cached.
Eviction
/ɪˈvɪkʃən/
Removing an entry to make room for a new one. Choosing which entry to evict is the cache's main intelligence. LRU, LFU, and FIFO are the common policies.
Warm / Cold
/wɔːm · koʊld/
A "cold" cache is empty or stale — every request is a miss. A "warm" cache has been running long enough that hit rate is high. Cold-start performance is a common pain.
§ 07 — Knowledge check

Five questions.
Mind the staleness.

Locking in the intuition. Pick an answer; the explanation drops in instantly.

QUESTION 1 OF 5
Loading question...
Score: 0 / 5
5 / 5

Warm.

Your cache intuition is solid. Time to look at the underlying physics: latency itself.

§ 08 — The recap

Three ideas to
carry forward.

Caching is the most universally useful performance technique in computing. Carry these:

i

A cache is a fast layer over a slow one

The whole game is moving popular answers up the speed pyramid so the next request avoids the slow tier entirely.

ii

Hit rate is everything

90% hit rate is a 9× speedup. 99% is 50×. Access patterns decide the ceiling more than cache size does.

iii

Invalidation is the hard part

Adding a cache is easy. Knowing when its answers stopped being true is genuinely hard. Pick a strategy per kind of data.

↓ UP NEXT

M.08 — Latency,
Throughput & the Speed of Light.

You've seen how caching reshapes latency. Now let's look at the physical limits beneath it. Why a "fast network" still has minimum delays. Why throughput and latency are different problems. And why packets from London to Tokyo can never beat the speed of light.

Continue to Module 08 →