Module 05 / 20 · Phase B — Data & Storage · 44 min

SQL vs NoSQL,
without the hype.

A decade of internet arguments boiled down to engineering trade-offs. Two database families exist for good reasons — and most apps eventually use both. Time to stop picking sides and start picking the right tool.

// What you'll know by the end

  • Why databases exist (and why files don't cut it)
  • The relational model in one sitting
  • The four flavors of NoSQL
  • When to pick which — and why "both" is often right
§ 01 — Why this exists at all

Imagine storing
Twitter in a
text file.

Every database tutorial starts with syntax. Bad move. The first thing to understand is why databases exist in the first place. The shortest path is to imagine what happens without one. You start with a text file — easy! Append a line for each tweet, read the file to load them. For about 30 seconds, this works.

// THE TWO PATHS FOR STORING DATA
✗ FILES (DIY)
  • Two writes at once = data corruption
  • Search = scan every line, every time
  • Want sorted? Re-sort the whole file
  • Server crash mid-write = mangled file
  • No way to express relationships
  • 10M rows = 10M scans per query
  • Multiple readers? Hope for the best
✓ DATABASES
  • Concurrency: many writers at once, safely
  • Indexes: find any row in microseconds
  • Queries: ask in declarative language
  • Durability: survives crashes, power loss
  • Relationships: foreign keys, joins, refs
  • Transactions: all-or-nothing changes
  • Replication: copy across machines

Databases exist to solve all the problems on the left. They're not magic — they're the result of 50 years of engineers refusing to write the same bugs over and over. Once you see them this way, the question "SQL or NoSQL?" stops being about taste and becomes a question about which set of problems you most need solved.

§ 02 — The relational model

Tables. Rows.
Relationships.

The relational model was invented in 1970 by Edgar Codd, and has dominated databases ever since for one reason: it's brilliantly simple. You organize your data into tables (one for each kind of thing). Each table has columns (the fields) and rows (the actual records). Tables connect to each other through shared values — foreign keys.

// EXAMPLE: A TINY BLOG SCHEMA

USERS id PK · int name varchar email varchar created_at timestamp POSTS id PK · int user_id FK → users title varchar body text published bool COMMENTS id PK · int post_id FK → posts user_id FK → users body text created_at timestamp 1:N 1:N users.id → comments.user_id
-- Get every post with its author and comment count SELECT posts.title, users.name AS author, COUNT(comments.id) AS comment_count FROM posts JOIN users ON posts.user_id = users.id LEFT JOIN comments ON comments.post_id = posts.id GROUP BY posts.id, users.name ORDER BY comment_count DESC;

That SQL query is the superpower. You describe what you want, not how to get it. The database figures out how to find the rows efficiently — which indexes to use, which order to join tables in, how to compute the count. The same query works whether the tables hold 100 rows or 100 million. This is why SQL has outlasted every database trend for half a century.

Relational means: data lives in tables, and tables connect through values.

The downside: you have to plan ahead. You define a schema — the columns, their types, the relationships — before you can store anything. Changing the schema later (adding a column, splitting a table) is doable but takes effort. For most apps, this is a feature, not a bug: enforced structure prevents data chaos.

§ 03 — The four flavors of NoSQL · interactive

"NoSQL" is four
different things.

"NoSQL" is the laziest umbrella term in computing. It just means "not the relational model." Underneath that label are four genuinely different families of databases, each solving a different problem. Click any card to see how it works.

Document

// the JSON store
MongoDB · Couchbase · Firestore

Store nested JSON-like documents. Each document is a complete record — no joins needed because related data lives inside the same document.

Why pick it: Your data is naturally shaped like documents (user profiles, blog posts, products). Each "thing" is mostly self-contained. You want to evolve the structure without schema migrations.

Where it hurts: When the same data needs to live in multiple documents (a user's name appearing on all their posts). Updates become expensive. Many-to-many relationships get painful.
click to expand

Key-Value

// dead simple, wickedly fast
Redis · DynamoDB · Memcached

A giant dictionary: give it a key, get back a value. That's it. No queries, no joins — just lookups. Often kept entirely in memory, so blazing fast.

Why pick it: Session storage. Caches. Rate limiters. Feature flags. Anything where you know the exact key and just want the value as fast as possible. Often used alongside SQL, not instead of it.

Where it hurts: You can't ask questions like "find all values where X". The only access pattern is "I know the key." Modeling anything complex means inventing your own key conventions.
click to expand

Wide-Column

// for the petabyte club
Cassandra · HBase · ScyllaDB

Looks like SQL — rows and columns — but rows can have wildly different columns, and the database is engineered to scale horizontally across hundreds of machines.

Why pick it: You have so much data that no single machine can hold it. Time-series data, IoT events, large-scale analytics. You're willing to design your queries upfront for the database to optimize.

Where it hurts: Operationally complex. Joins basically don't exist. You denormalize aggressively. Almost never the right choice for a starting team.
click to expand

Graph

// when relationships are the data
Neo4j · Neptune · ArangoDB

Data is nodes (things) and edges (relationships between them). Built for queries like "friends of friends" or "shortest path" — the kinds of questions that make SQL cry.

Why pick it: Social networks. Fraud detection. Knowledge graphs. Recommendation engines. Any problem where the relationships matter as much as the entities — and traversal depth matters.

Where it hurts: Most apps don't have graph-shaped data. Even social apps work fine in SQL for the first few million users. Graph databases shine in specific domains; outside them, they're overkill.
click to expand

Each of these solves a real problem. None of them is "better" than relational — they're different shapes for different needs. The pragmatic truth is that most modern systems use relational as the source of truth, plus key-value for caching, plus maybe one of the others if a specific problem demands it. We'll see this in the lab next.

§ 04 — Query playground · interactive lab

Same question.
Two answers.

Pick a scenario below and see exactly how the same business question is modeled and answered in SQL (relational) versus NoSQL (document-style). The data is the same; the shape, query, and trade-offs are not.

QUERY_PLAYGROUND.SIM // m.05 lab
⌘ RELATIONAL · SQL Postgres-flavored
// Data model

          
// Query

          
// Result

          
⌘ DOCUMENT · NoSQL MongoDB-flavored
// Data model

          
// Query

          
// Result

          
// THE TRADE-OFF · WHAT YOU'RE SEEING
Pick a scenario

Each scenario shows the same job done two ways. Pay attention to which side does more work at write time and which does more at read time. That's the heart of the SQL/NoSQL trade-off.

§ 05 — ACID, BASE, and the real-world choice

Two letters,
two philosophies.

You'll hear "ACID" and "BASE" in any database conversation lasting longer than five minutes. They describe different sets of guarantees a database provides when things get hard — concurrent writes, network failures, server crashes. The names are clever; the ideas are simple.

ACID

// the SQL default — strong promises
  • AAtomicity — all-or-nothing: a transaction fully completes or fully rolls back. Never half-done.
  • CConsistency — the database always satisfies its rules (constraints, foreign keys) before and after each transaction.
  • IIsolation — concurrent transactions don't see each other's half-written state.
  • DDurability — once committed, data survives crashes. Even if the power dies the next millisecond.

BASE

// classic NoSQL — relaxed for scale
  • BABasically Available — the system stays responsive, even if some data is temporarily stale.
  • SSoft state — data may change without explicit updates, as background processes sync.
  • EEventual consistency — given enough time without writes, all nodes will agree. Eventually.
  • Trades guarantees for availability and scale.

The old narrative was: "ACID is SQL, BASE is NoSQL, pick your trade-off." The current reality is messier and better. Most major NoSQL databases now offer ACID transactions (MongoDB, DynamoDB, Cosmos). Most SQL databases offer eventual consistency modes for replicas. The line between the two is fuzzy and shrinking.

The practical question is no longer "ACID or BASE?" It's: "For this specific operation, do I need strong consistency right now, or can I tolerate seeing slightly stale data for better speed and availability?" Banking transactions: strong. Like counts on a social feed: eventual. Same app, different choices per feature.

// REAL-WORLD DECISION GUIDE

You're starting a new product and unsure of the data shape
SQL first
User profiles, products, orders — structured business data with clear relationships
SQL
Session storage, caches, rate limiters, real-time leaderboards
Key-Value
Catalogs of varied, semi-structured items (product specs vary per category)
Document
Time-series data: metrics, logs, IoT, billions of small writes
Wide-Column
"People you may know," fraud rings, knowledge networks
Graph
A mature production app at scale
Usually both

The rule of thumb that holds up after a decade: start with PostgreSQL (or MySQL). Add Redis for caching. Add a specialized store only when you have a specific reason. Don't choose NoSQL because it's trendy or because someone told you SQL doesn't scale (it absolutely does). Choose the data store that matches your actual access patterns — and don't be afraid to use more than one.

§ 06 — Eight words for this terrain

Vocabulary,
for data.

These pop up in every database discussion. Learn them well enough to use them precisely.

Schema
/ˈskiːmə/
The shape of your data — tables, columns, types, constraints. SQL has strict schemas; document stores have flexible ones. "Schema migration" = changing the shape.
JOIN
/dʒɔɪn/
A SQL operation that combines rows from multiple tables based on matching column values. The signature move of relational databases — and what NoSQL skips.
Foreign Key
/ˈfɒrɪn kiː/
A column that references another table's primary key. posts.user_id is a foreign key pointing to users.id. Enforces referential integrity.
Transaction
/trænˈzækʃən/
A group of operations treated as one unit — all succeed or all roll back. The mechanism behind the "A" in ACID. Essential for anything involving money.
Normalization
/ˌnɔːməlaɪˈzeɪʃən/
Designing tables so each piece of data lives in exactly one place. Eliminates duplication. Pays for itself in updates; costs you in reads (needs joins).
Denormalization
/diːˌnɔːməlaɪˈzeɪʃən/
The opposite — deliberately duplicating data for faster reads. NoSQL document stores do this by default. SQL apps do it for performance hot spots.
Replica
/ˈrɛplɪkə/
A copy of the database running on another machine. Read replicas spread query load; standbys take over if the primary dies. Both SQL and NoSQL support this.
Sharding
/ˈʃɑːdɪŋ/
Splitting one logical database across many machines, with each holding a slice of the data. Hard to do right; necessary at huge scale. We'll go deeper later.
§ 07 — Knowledge check

Five questions.
Pick the right tool.

Testing intuition, not memorization. Click an answer; you'll see why instantly.

QUESTION 1 OF 5
Loading question...
Score: 0 / 5
5 / 5

Pragmatist.

You don't pick sides — you pick tools. That's the right instinct.

§ 08 — The recap

Three ideas to
carry forward.

Forget the tribal debates. Carry these instead.

i

SQL is the safe default

The relational model has won for 50 years because tables + JOINs + transactions handle 90% of business problems beautifully. Start here unless you have a reason not to.

ii

"NoSQL" is four families

Document, Key-Value, Wide-Column, Graph. Each solves a specific problem. Treat them as specialized tools, not a SQL replacement.

iii

Real apps use both

Postgres for the source of truth, Redis for caching, maybe a graph DB for specific features. The question is never "which one" — it's "which combination."

↓ UP NEXT

M.06 — Indexes &
the B-Tree secret.

SQL queries don't scan every row. They use indexes — clever data structures that let databases find any row in milliseconds even at scale. Time to look inside the B-tree, the structure quietly running half the internet.

Continue to Module 06 →