You've learned every piece. Now design a real one. bit.ly looks simple — type a long URL, get back bit.ly/x7k, click it, get redirected. Underneath: every concept from the last 16 modules, composed.
A URL shortener does exactly two things. Shorten: take a long URL, give back a short code. Redirect: take a short code, send the user to the long URL. That's it. So why is bit.ly a 9-figure business and why does this question show up in nearly every senior engineering interview? Because "two operations" hides every distributed-systems problem you've ever learned about — and a few you haven't.
The deceptive bit: reads massively outnumber writes. Someone shortens a URL once; tens, hundreds, sometimes millions of people click it. The read/write ratio is often 100:1 or more. That asymmetry shapes every architecture decision below. Reads need to be blisteringly fast and infinitely scalable; writes need to be correct, unique, and cheap to store at volume. Different problems, different solutions, same system.
Over the next four sections we'll scope the problem, estimate the load, design the API, pick a short-code strategy, and then watch the architecture evolve from a single laptop to a global service — adding one component at a time, each justified by an actual scaling pain.
The mistake juniors make in design interviews — and in real life — is jumping to "let's use Kafka and Cassandra" before knowing whether they actually need either. Senior engineers do the opposite: requirements then estimation then design. The numbers tell you what's hard. Without them, every architecture decision is decoration.
POST a long URL, return a short codeGET a short code, send 302 to long URL/my-talk)Now the math. We don't need precision — we need order-of-magnitude estimates to know what tools we'll need. Senior engineers do this on a napkin in 60 seconds. Assume bit.ly-scale, post-launch.
100M new URLs per day ÷ 86,400 seconds100:1 read-to-write ratio × 1,200 writes/sec500 bytes per URL × 100M URLs/day365 days × 5 years horizonWhat these numbers tell us, without writing any code: writes are easy (1,200/sec is a single Postgres box), reads are the problem (120k/sec needs caching and horizontal scaling), storage isn't crazy (90 TB is large but a single beefy DB cluster), and latency budget is tight (50ms p99 for an internet round-trip means almost no margin for slow queries). Now we know what we're solving for.
The API is small. The interesting decision is buried in step one: how do you generate the short code? Several strategies work; they have very different scaling characteristics. Let's nail the API first, then dig into the trade-off.
request: { "url": "https://example.com/very/long/path?utm=..." }
response: { "short": "x7k", "full": "https://syf.gg/x7k" }
request: GET /x7k
response: HTTP/1.1 302 Found
Location: https://example.com/very/long/path?utm=...
Now the real puzzle: what goes in "short"? You have ~3.5 trillion possible codes if you use 7 characters of base62 (627), which is plenty even at bit.ly scale. The question is how do you choose each one? Four common strategies, each with different costs.
The pragmatic answer for most systems: start with auto-increment + base62. It's clean, collision-free, and a single Postgres can comfortably hand out IDs at thousands per second. If the central counter becomes a bottleneck later, migrate to the ID-range lease pattern. Don't over-engineer for scale you don't have.
The base62 encoding deserves a second look: characters [a-zA-Z0-9] = 62 options per position. With 7 characters you get 627 ≈ 3.5 trillion unique codes. Even at 100M URLs/day, that's a 100-year supply. Most short-URL services use 6-8 characters for exactly this reason — short enough to type, long enough to never run out.
Five stages. Click each one to see how the architecture changes when the previous one breaks. We start with one server doing everything (it handles 100 req/s), and end with a fully distributed system handling millions per second. Every component gets added for a reason — when you click forward, the "what changed" panel explains exactly what just broke and why we added the new piece.
Once the architecture is in place, the read path — what happens on a redirect — is where 99% of CPU time goes. It's also the easiest path to make fast because short-code-to-URL mappings are immutable. Once you've created the mapping, it never changes. That property unlocks aggressive caching with zero invalidation logic. Here's the canonical read path:
Three observations from this diagram. First: the cache absorbs nearly all traffic. A few popular links — a viral tweet, a marketing campaign URL — account for the vast majority of redirects. Even a tiny cache catches them. With 1 GB of Redis you'll cache millions of mappings; hit rates of 95-99% are typical and the read path becomes "Redis lookup → 302" in under 5 milliseconds.
Second: cache invalidation isn't a problem here. Short codes never change. Once cached, the entry is correct forever (or until TTL eviction). This is the dream case for caching — most caching headaches come from staleness. Here there is none. Set a long TTL, let LRU eviction handle the unpopular entries.
Every click generates a "this link was clicked" event you'll want for the dashboard. Never write that synchronously to the DB in the redirect handler. A single extra DB write per redirect would tank your read latency from 5ms to 30ms+.
Instead: fire-and-forget into a message queue (Kafka, SQS, Kinesis). The app server pushes the event to the queue (~1ms) and returns the 302 immediately. A separate analytics worker consumes the queue at its own pace, aggregates into a separate analytics database (often columnar — ClickHouse, BigQuery), and powers the dashboard. Decouples user-facing latency from internal bookkeeping. This pattern recurs in every read-heavy system.
That's the whole system, summarized: writes go through a central counter into a durable DB; reads hit a cache first with DB fallback; analytics are decoupled via a queue; the read path is what scales. Add a CDN above the LB for global edge caching of the most popular links and you're at bit.ly scale. Every piece you added has a clear job, justified by an actual constraint. That's what good system design looks like.
These are the terms that show up in design interviews, architecture reviews, and incident retros. The fluency you build using them is what marks senior engineers.
[a-zA-Z0-9] — 62 characters per position. 7 chars = 3.5 trillion combinations. Compact, URL-safe, easy to type. The standard for short codes.Location header. Use 302 (not 301) so changes can propagate; 301 is permanently cached.Test the design intuition. Click an answer; the explanation lands instantly.
You can synthesize the whole stack. Next: image sharing — read-heavy but now with binary objects, which changes everything.
This is the design pattern that scales across most read-heavy systems.
Numbers tell you what's actually hard. Without back-of-envelope math, every architecture decision is decoration.
Read-heavy systems demand caching on the read path and async handling for everything that isn't the redirect itself.
Start with one box. Add each component when something measurably breaks. Don't pre-build for traffic you don't have.