Indexes make a database fast. Caches make it invisible. The trick: hold onto recent answers so you never have to ask again. The cost: deciding which answers to keep, which to drop, and when to admit yours might be wrong.
Computers have a layered memory system. CPU registers are the fastest, RAM is slower, SSDs are slower still, and reaching across the network is comically slow by comparison. The differences aren't 2× or 10× — they're hundreds of thousands of times. Caching exists because of this gap. Every cache is just an attempt to move data up one tier so the next request gets a faster answer.
Read that last line again. If a single nanosecond were a second, a single cross-region network call would take two and a half years. A RAM access would take two minutes. This is why a cache hit feels instant and a cache miss feels slow. When you "add a cache," you're moving an answer from the four-month tier to the two-minute tier. The performance gain is not subtle.
Strip away the jargon and a cache is a very simple idea: a small, fast storage layer that sits between a requester and a slow store, holding recent answers. When the same question comes in twice, the second one gets answered from the fast layer without bothering the slow one. The pattern is identical at every level of computing — CPU caches, browser caches, CDN edges, application caches, database query caches. Same shape, different distances.
The math that makes caching worth it is simple. If 90% of your requests are hits, your average response time is (0.9 × 1ms) + (0.1 × 80ms) ≈ 9ms. Without the cache, every request paid the full 80ms. That's a 9× speedup just by remembering. The whole career of cache engineering is in two questions: "can I push hit rate higher?" and "when do I throw stale data away?"
By the time a single page lands in your browser, a typical request has already been intercepted by four separate caches. Each one tries to answer before passing the request deeper. Click each card to see where it lives and what it holds.
Your browser stores static assets (CSS, JS, images) it has fetched before. If Cache-Control says they're still fresh, it serves them without even hitting the network.
Cache-Control: max-age=86400 with a response, and the browser keeps it for a day. ETag headers let the browser cheaply check if a cached asset is still valid without re-downloading. Content delivery networks (Cloudflare, Fastly, Akamai) keep copies of your static assets in hundreds of edge locations. Users hit the nearest copy — often within their own city.
logo.png, the CDN fetches it from your origin server, caches it at the Tokyo edge, and serves it. Every Tokyo user after that gets it locally — under 10ms instead of 200ms across the ocean.A separate fast store your application checks before hitting the database. Holds whatever you choose — query results, session data, computed values, rate-limit counters.
cache.get("user:42") first. If hit, return it. If miss, query the DB, store in cache with a TTL, return. This is the cache-aside pattern — the most common in production.Even when you "go to the database," you usually don't go to the disk. The database keeps recently-read pages in RAM. Most queries are answered from this buffer without disk I/O.
Notice the pattern: each layer holds less data but answers faster than the one beneath it. A request that misses Layer 1 tries Layer 2, then Layer 3, then Layer 4, then finally falls to disk. The deeper you go, the slower each step is. A well-designed system makes sure the popular data lives high up where the answers are fast.
Below: a live cache sitting between a client and a slow database. Pick a workload pattern, a cache size, and an eviction policy. Press Start and watch requests stream through. The hit rate climbs as the cache warms up — and the access pattern decides whether it climbs to 30% or 95%.
The three workloads behave very differently. Hot keys (Zipfian) is what real internet traffic looks like — a few popular keys dominate, and caching wins big. Uniform spreads requests evenly across a huge keyspace; small caches barely help. Scan reads every key once, defeating any LRU cache. Cache effectiveness is mostly about matching policy to pattern.
Phil Karlton said it best: "There are only two hard things in computer science: cache invalidation and naming things." The joke endures because cache invalidation really is brutal. You decide to cache an answer; the underlying data changes; now your cache is wrong. Worse — you don't know it's wrong. You return the stale answer to a user. They see something out of date. Multiply across users and now you have an incident.
You cache a user's email. They update it in the settings page. The next 60 requests still see the old email — because the cache hasn't been invalidated. This is the canonical hard problem.
The trade-off: caching longer = better performance, but more chances to serve stale data. Caching shorter = fresher data, but more database load. Every system has to pick a point on this curve, often per kind of data.
A popular cache key expires. Suddenly 1000 requests hit, all see a miss, all hit the database simultaneously. Your DB — which the cache was protecting — gets crushed under sudden load. The cache failure cascades.
Mitigations: stagger TTLs with jitter, use a lock so only one request refills the cache while others wait, or serve a slightly-stale value while refreshing in the background. The good news: it's a known problem with known fixes.
EXPIRE 600). Simple, automatic, but you accept up to N seconds of staleness. Best for data that's "kinda fresh" — feed contents, public stats, search results.cache.delete("user:42") when user 42 changes. Most accurate, most labor — easy to forget a code path and end up with stale entries. Often combined with a backup TTL.The honest meta-strategy: cache the things that change rarely or matter less if they're slightly stale. Product descriptions, search results, leaderboards, user profiles — easy wins. Account balances, permission checks, payment status — be very careful. The art is in picking which side of that line each piece of data belongs on.
These show up in every incident review and architecture meeting. Get fluent.
SET key value EX 300 = expires in 5 minutes.Locking in the intuition. Pick an answer; the explanation drops in instantly.
Your cache intuition is solid. Time to look at the underlying physics: latency itself.
Caching is the most universally useful performance technique in computing. Carry these:
The whole game is moving popular answers up the speed pyramid so the next request avoids the slow tier entirely.
90% hit rate is a 9× speedup. 99% is 50×. Access patterns decide the ceiling more than cache size does.
Adding a cache is easy. Knowing when its answers stopped being true is genuinely hard. Pick a strategy per kind of data.