email = ?) and ranges (created_at > ?)How does a database find one row out of ten million in under a millisecond? The answer is a 50-year-old data structure quietly running half the internet. Crack open the B-tree and SQL stops looking like magic.
Your users table has 10 million rows. You ask the database: "give me the user whose email is alice@example.com." If the database had no clever tricks, it would do the obvious thing — check row 1, check row 2, check row 3, all the way to row ten million. On average it would touch 5 million rows before finding hers. Watch what that looks like across different table sizes:
Five seconds to find a single user is unacceptable. Fifty seconds is an outage. And yet databases do this lookup all the time — on tables with billions of rows — and return answers in single-digit milliseconds. Something very clever has to be happening, and that something is called an index.
Forget databases for a second. Think about a physical library with a million books. If someone asks you "do you have a book called The Great Gatsby?", you don't walk to every shelf. You walk to the card catalog — an alphabetical drawer of cards, each pointing to a book's location. You flip to "G," find the card, walk to the shelf number it lists, and you're done. Three minutes instead of three weeks.
Check every book in the library, one by one. Each one is "is this Gatsby? no... is this Gatsby? no..." For a million books, you're checking 500,000 on average before finding it.
Open the catalog drawer. Cards are sorted. You can skip half the alphabet at every step. Twenty checks gets you to any card in a million. Then walk straight to the shelf.
That's it. An index in a database is the card catalog. It's a separate, sorted structure that maps a column's values to the row locations. When you say SELECT * FROM users WHERE email = 'alice@...', the database checks if there's an index on email. If yes, it walks the index — a logarithmic number of steps — and jumps straight to the row. If no, it falls back to scanning every single row.
The structure that databases actually use for this isn't a stack of paper cards, but the idea is exactly the same — sorted, skippable, designed for fast lookups. The structure is called a B-tree (it stands for "balanced," not "binary"), and once you see how it works, the speed makes total sense.
A B-tree is a tree where each node holds many keys (not just one, like a binary tree). Each node stores sorted keys and pointers to child nodes. When you search, you compare the target against the keys in one node and descend to the child whose range covers your value. Because each node has many children, the tree stays very shallow — even a billion rows fits in 3 or 4 levels.
That picture is the whole secret. To find row 4,267,891 in a table of 10 million rows, the database checked just 3 nodes. The root node ("is it before or after 2.5M / 5M / 7.5M?"), one internal node ("is it before or after 3M / 4M / 4.5M?"), and the leaf node ("here's the exact key and its pointer to the actual row"). Three reads instead of 5 million. That's the difference between 5 milliseconds and 5 seconds.
The depth grows logarithmically. A B-tree with 100 keys per node holds about a million rows in 3 levels, a billion rows in 4 levels, and a trillion rows in 5. That ceiling is why databases scale so well. Doubling your data adds zero levels most of the time.
Pick a table size and hit Race. On the left, a sequential scan starts checking rows one by one. On the right, a B-tree lookup descends three levels and stops. The numbers tell the story even better than the animation.
Each row size shows a different ratio. At 10K rows, scanning is annoying but survivable. At 1M, it's the difference between snappy and sluggish. At 100M, it's the difference between working and broken. This is why indexes aren't optional at scale.
If indexes made reads a million times faster with no downside, every column would have one. They don't, because indexes cost real things. Understanding what they cost is the difference between a sluggish database and a fast one.
Every INSERT, UPDATE, or DELETE must also update every index that includes the changed column. A table with 5 indexes does 6× the work on every write. If your app writes more than it reads, indexes can hurt.
An index is a separate B-tree stored on disk. It can be huge — sometimes as large as the table itself. The hot parts get cached in RAM, competing with the data itself for memory. More indexes = more storage cost.
An index on email doesn't speed up queries on name. Worse: with a query the optimizer can't recognize, it might ignore your index entirely and scan anyway. Indexes are precise tools, not magic dust.
An index on a column with only 2-3 distinct values (like is_active) barely helps — every value still matches millions of rows. Indexes work best on columns with many unique values: emails, IDs, timestamps.
email = ?) and ranges (created_at > ?)ORDER BY) frequently, especially with LIMITThe pragmatic workflow: don't pre-emptively index everything. Build the app, run real queries, look at slow query logs, then add an index where one is genuinely missing. PostgreSQL's EXPLAIN ANALYZE will literally show you when it's scanning vs. using an index — read those plans, and your intuition will sharpen fast.
You'll see these in every conversation about slow queries. Know them well.
(user_id, created_at). Order matters — works for queries on user_id alone, or both, but not created_at alone.EXPLAIN shows you what it picked.Quick test of the index intuition. Click an answer; get the explanation.
You see the planner now. Onward — caching is next, and it shaves even more milliseconds.
The mental model of "scan vs. tree" is one of the most useful things you'll ever learn.
Not magic. A separate B-tree mapping column values to row locations. The database checks for one; if it exists, it uses it.
A billion rows fits in 4 tree levels. Doubling your data adds about zero work. This is why databases scale to massive sizes.
Every write must update every index. Pick the few that matter. Read query plans (EXPLAIN) to know what's actually being used.