Design a URL Shortener — Complete System Design Walkthrough

§ 01 — A misleadingly small problem

It's just two
operations.

A URL shortener does exactly two things. Shorten: take a long URL, give back a short code. Redirect: take a short code, send the user to the long URL. That's it. So why is bit.ly a 9-figure business and why does this question show up in nearly every senior engineering interview? Because "two operations" hides every distributed-systems problem you've ever learned about — and a few you haven't.

// SOMETHING SHORT POINTS AT SOMETHING LONG

syf.gg/x7k

→

https://www.systemdesigntutorial.com/courses/beginner/modules/m17-build-a-url-shortener?utm_source=twitter&utm_campaign=launch

syf.gg/aB2

→

https://docs.systemdesigntutorial.com/architecture/scaling-strategies/horizontal-vs-vertical/chapter-3

syf.gg/q9F

→

https://github.com/systemdesign/example-app/pull/4421/files#diff-a7b3c2d

The deceptive bit: reads massively outnumber writes. Someone shortens a URL once; tens, hundreds, sometimes millions of people click it. The read/write ratio is often 100:1 or more. That asymmetry shapes every architecture decision below. Reads need to be blisteringly fast and infinitely scalable; writes need to be correct, unique, and cheap to store at volume. Different problems, different solutions, same system.

Reads and writes are not the same kind of problem. The art is treating them differently.

Over the next four sections we'll scope the problem, estimate the load, design the API, pick a short-code strategy, and then watch the architecture evolve from a single laptop to a global service — adding one component at a time, each justified by an actual scaling pain.

§ 02 — Scope & estimate before designing

Numbers first,
architecture second.

The mistake juniors make in design interviews — and in real life — is jumping to "let's use Kafka and Cassandra" before knowing whether they actually need either. Senior engineers do the opposite: requirements then estimation then design. The numbers tell you what's hard. Without them, every architecture decision is decoration.

// REQUIREMENTS · WHAT WE'RE BUILDING

// FUNCTIONAL

What it does

Shorten: POST a long URL, return a short code
Redirect: GET a short code, send 302 to long URL
Optional: custom aliases (/my-talk)
Optional: expiration, basic click analytics

// NON-FUNCTIONAL

How it must behave

High availability — links break = trust broken
Low read latency — <50ms p99 for redirects
Massive scale — billions of URLs, millions/sec reads
Durable — never lose a mapping

Now the math. We don't need precision — we need order-of-magnitude estimates to know what tools we'll need. Senior engineers do this on a napkin in 60 seconds. Assume bit.ly-scale, post-launch.

// BACK-OF-ENVELOPE — DO THE MATH FAST

100M new URLs per day ÷ 86,400 seconds

~1,200 writes/sec

100:1 read-to-write ratio × 1,200 writes/sec

~120,000 reads/sec

500 bytes per URL × 100M URLs/day

~50 GB/day

50 GB/day × 365 days × 5 years horizon

~90 TB

120k reads/sec at 50ms p99 = cache or die

→ Redis layer required

What these numbers tell us, without writing any code: writes are easy (1,200/sec is a single Postgres box), reads are the problem (120k/sec needs caching and horizontal scaling), storage isn't crazy (90 TB is large but a single beefy DB cluster), and latency budget is tight (50ms p99 for an internet round-trip means almost no margin for slow queries). Now we know what we're solving for.

§ 03 — API & short-code generation

Two endpoints,
one hard choice.

The API is small. The interesting decision is buried in step one: how do you generate the short code? Several strategies work; they have very different scaling characteristics. Let's nail the API first, then dig into the trade-off.

// THE TWO ENDPOINTS · MINIMAL VIABLE API

POST/api/shorten

Create a short code for a long URL. Idempotent if same URL submitted by same user.

request:  { "url": "https://example.com/very/long/path?utm=..." }
response: { "short": "x7k", "full": "https://syf.gg/x7k" }

GET/:short

Resolve and redirect. Logs analytics async. Returns HTTP 302.

request:  GET /x7k
response: HTTP/1.1 302 Found
          Location: https://example.com/very/long/path?utm=...

Now the real puzzle: what goes in "short"? You have ~3.5 trillion possible codes if you use 7 characters of base62 (62⁷), which is plenty even at bit.ly scale. The question is how do you choose each one? Four common strategies, each with different costs.

// FOUR WAYS TO GENERATE SHORT CODES

Hash & truncate// MD5(url)[:7]

Deterministic — same URL = same code. But collisions: two different URLs can hash to the same prefix. Need collision-handling logic, makes it stateful. Also: identical URLs share a code (sometimes wanted, sometimes not).

WORKS

Random + retry// generate, check DB

Generate a random 7-char string, try to insert. If collision (PK violation), generate another. Simple. Each insert is a DB roundtrip just to check uniqueness — fine at 1K writes/sec, painful at 100K.

WORKS

Auto-increment + base62// id=12345 → "3D7"

DB assigns sequential ID; encode in base62 for compactness. Zero collisions by construction. Downside: codes are predictable (sequential), and you need a central counter — a coordination point.

CLEAN

ID range lease// Snowflake-style

A counter service hands out batches of 10,000 IDs to each app server. Apps assign within their batch — no DB hit per write. Best for huge scale; bit.ly-scale benefits from this. Complexity adds up.

SCALES

The pragmatic answer for most systems: start with auto-increment + base62. It's clean, collision-free, and a single Postgres can comfortably hand out IDs at thousands per second. If the central counter becomes a bottleneck later, migrate to the ID-range lease pattern. Don't over-engineer for scale you don't have.

The base62 encoding deserves a second look: characters [a-zA-Z0-9] = 62 options per position. With 7 characters you get 62⁷ ≈ 3.5 trillion unique codes. Even at 100M URLs/day, that's a 100-year supply. Most short-URL services use 6-8 characters for exactly this reason — short enough to type, long enough to never run out.

§ 04 — Architecture evolution · interactive lab

Watch one box
become a system.

Five stages. Click each one to see how the architecture changes when the previous one breaks. We start with one server doing everything (it handles 100 req/s), and end with a fully distributed system handling millions per second. Every component gets added for a reason — when you click forward, the "what changed" panel explains exactly what just broke and why we added the new piece.

EVOLUTION.SIM // m.17 lab

// STAGE 1 · DAY 1

One box does it all

Single server with the web server, app, and database all running on the same host. Like a hackathon project or first launch — every system starts here. Handles real traffic, fits in your head, deploys in minutes.

// CURRENT CAPACITY

~100req/s

single 4-core box · sufficient for early days

⚠ NEXT BOTTLENECK

Database I/O on the same box as the app. Once traffic grows, DB queries compete with HTTP handling for CPU and disk. Click Stage 2.

§ 05 — The read path & analytics

Two paths,
different physics.

Once the architecture is in place, the read path — what happens on a redirect — is where 99% of CPU time goes. It's also the easiest path to make fast because short-code-to-URL mappings are immutable. Once you've created the mapping, it never changes. That property unlocks aggressive caching with zero invalidation logic. Here's the canonical read path:

// THE READ PATH · CACHE FIRST, DB FALLBACK

Three observations from this diagram. First: the cache absorbs nearly all traffic. A few popular links — a viral tweet, a marketing campaign URL — account for the vast majority of redirects. Even a tiny cache catches them. With 1 GB of Redis you'll cache millions of mappings; hit rates of 95-99% are typical and the read path becomes "Redis lookup → 302" in under 5 milliseconds.

Second: cache invalidation isn't a problem here. Short codes never change. Once cached, the entry is correct forever (or until TTL eviction). This is the dream case for caching — most caching headaches come from staleness. Here there is none. Set a long TTL, let LRU eviction handle the unpopular entries.

// THE KEY INSIGHT — ASYNC ANALYTICS

Don't slow your read path for analytics

Every click generates a "this link was clicked" event you'll want for the dashboard. Never write that synchronously to the DB in the redirect handler. A single extra DB write per redirect would tank your read latency from 5ms to 30ms+.

Instead: fire-and-forget into a message queue (Kafka, SQS, Kinesis). The app server pushes the event to the queue (~1ms) and returns the 302 immediately. A separate analytics worker consumes the queue at its own pace, aggregates into a separate analytics database (often columnar — ClickHouse, BigQuery), and powers the dashboard. Decouples user-facing latency from internal bookkeeping. This pattern recurs in every read-heavy system.

That's the whole system, summarized: writes go through a central counter into a durable DB; reads hit a cache first with DB fallback; analytics are decoupled via a queue; the read path is what scales. Add a CDN above the LB for global edge caching of the most popular links and you're at bit.ly scale. Every piece you added has a clear job, justified by an actual constraint. That's what good system design looks like.

§ 06 — Eight words for system-design conversations

Vocabulary,
for the whiteboard.

These are the terms that show up in design interviews, architecture reviews, and incident retros. The fluency you build using them is what marks senior engineers.

Base62

/ˌbeɪs ˈsɪksti tuː/

Encoding using [a-zA-Z0-9] — 62 characters per position. 7 chars = 3.5 trillion combinations. Compact, URL-safe, easy to type. The standard for short codes.

302 Redirect

/θriː oʊ tuː/

HTTP "temporary redirect." The response a URL shortener returns. Browser follows the Location header. Use 302 (not 301) so changes can propagate; 301 is permanently cached.

Read/Write Ratio

/riːd raɪt ˈreɪʃiˌoʊ/

The proportion of reads to writes in a workload. URL shorteners are extreme read-heavy (100:1+). Shapes whether you optimize for read scaling, write scaling, or both.

Async Processing

/ˈeɪsɪŋk/

Doing work outside the request/response cycle by pushing events to a queue. The request returns fast; a worker handles the actual processing later. Powers analytics, emails, indexing.

Capacity Estimation

/kəˈpæsɪti/

Back-of-envelope math done before any design: req/sec, storage/year, latency targets. Forces you to know what's actually hard before choosing tools.

Cache Hit Rate

/kæʃ hɪt reɪt/

Percentage of requests answered by the cache without hitting the DB. URL shorteners typically hit 95-99% due to skewed popularity. The single biggest lever for latency at scale.

Sharding

/ˈʃɑːdɪŋ/

Splitting a dataset across multiple DBs (by user_id, key range, hash). What you do when one DB can't hold or serve all the data. Adds complexity; worth it at scale.

Message Queue

/ˈmɛsɪdʒ kjuː/

A durable FIFO buffer between producers and consumers. Kafka, RabbitMQ, SQS, Kinesis. Decouples services in time so neither blocks the other. The backbone of async architecture.

§ 07 — Knowledge check

Five questions.
Whiteboard ready?

Test the design intuition. Click an answer; the explanation lands instantly.

QUESTION 1 OF 5

Loading question...

Score: 0 / 5

5 / 5

Composed.

You can synthesize the whole stack. Next: image sharing — read-heavy but now with binary objects, which changes everything.

§ 08 — The recap

Three ideas to
carry forward.

This is the design pattern that scales across most read-heavy systems.

i

Estimate, then design

Numbers tell you what's actually hard. Without back-of-envelope math, every architecture decision is decoration.

ii

Reads and writes are different

Read-heavy systems demand caching on the read path and async handling for everything that isn't the redirect itself.

iii

Architecture evolves with scale

Start with one box. Add each component when something measurably breaks. Don't pre-build for traffic you don't have.

Build a
URL shortener.

// What you'll know by the end

It's just two
operations.

Numbers first,
architecture second.

// REQUIREMENTS · WHAT WE'RE BUILDING

What it does

How it must behave

// BACK-OF-ENVELOPE — DO THE MATH FAST

Two endpoints,
one hard choice.

// THE TWO ENDPOINTS · MINIMAL VIABLE API

// FOUR WAYS TO GENERATE SHORT CODES

Watch one box
become a system.

One box does it all

Two paths,
different physics.

// THE READ PATH · CACHE FIRST, DB FALLBACK

Don't slow your read path for analytics

Vocabulary,
for the whiteboard.

Five questions.
Whiteboard ready?

Composed.

Three ideas to
carry forward.

Estimate, then design

Reads and writes are different

Architecture evolves with scale

M.18 — Build an
image sharing app.

It's just twooperations.

Numbers first,architecture second.

// REQUIREMENTS · WHAT WE'RE BUILDING

What it does

How it must behave

// BACK-OF-ENVELOPE — DO THE MATH FAST

Two endpoints,one hard choice.

// THE TWO ENDPOINTS · MINIMAL VIABLE API

// FOUR WAYS TO GENERATE SHORT CODES

Watch one boxbecome a system.

One box does it all

Two paths,different physics.

// THE READ PATH · CACHE FIRST, DB FALLBACK

Don't slow your read path for analytics

Vocabulary,for the whiteboard.

Five questions.Whiteboard ready?

Composed.

Three ideas tocarry forward.

Estimate, then design

Reads and writes are different

Architecture evolves with scale

M.18 — Build animage sharing app.

It's just two
operations.

Numbers first,
architecture second.

Two endpoints,
one hard choice.

Watch one box
become a system.

Two paths,
different physics.

Vocabulary,
for the whiteboard.

Five questions.
Whiteboard ready?

Three ideas to
carry forward.

M.18 — Build an
image sharing app.