Module 09 / 20 · Phase C — Scale & Reliability · 40 min

Vertical vs
horizontal scaling.

One server isn't enough. Now what? You get two answers — and they take your system in completely different directions. The choice you make today shapes the next five years of architecture.

// What you'll know by the end

  • The difference between scaling up and scaling out
  • Why some things scale easily and some don't
  • When each strategy hits its wall
  • Why mature systems always end up using both
§ 01 — The moment of truth

Remember the
switchboard?

Back in Module 02, you cranked up the clients on a single server and watched the queue overflow. That was the universe telling you something: every single machine has a ceiling. CPU runs out. Memory fills up. Network port saturates. Disk I/O bottlenecks. Once you hit any one of these, your perfectly designed system stops responding. The question isn't whether you'll hit that wall — it's what you do when you do.

// ONE SERVER · TWO POSSIBLE FUTURES
// 9:00 AM
Tuesday morning
Traffic 2,000 req/s
CPU 45%
p99 latency 120 ms
Status healthy
// 9:00 PM · launch
Tuesday night
Traffic 80,000 req/s
CPU 100% pinned
p99 latency ∞ (timeouts)
Status on fire

The fundamental insight is this: computers don't grow gradually. They saturate and break. A server at 80% CPU runs fine. A server at 99% CPU is on the edge. A server at 100% is unreachable. There's no smooth degradation. So when traffic grows — and it always does — you need a plan. Two plans exist. They lead to different futures.

§ 02 — The two answers

Scale up.
Or scale out.

There are exactly two ways to handle more load. You either get a bigger machine (scale up, also called vertical scaling) — more CPU, more RAM, faster disk on the same server. Or you get more machines (scale out, also called horizontal scaling) — distribute work across many smaller servers. They sound like a minor distinction. They're not. They lead to two completely different architectures.

// SCALING UP vs SCALING OUT · THE METAPHOR

// VERTICAL · scale up
A taller tower
one machine, more powerful
small MASSIVE same machine, more powerful
Simple. Your code doesn't change. The OS sees more CPU, more RAM. Most apps just go faster automatically.
No coordination needed. Still one machine — no distributed systems problems to solve.
Has a ceiling. Even AWS's biggest box has a fixed top-end. You can't scale beyond what one machine can be.
Single point of failure. One machine = one place to fail. When it does, everyone is down.
// HORIZONTAL · scale out
More towers
many machines, distributed work
1 4 16 more machines, distributing work
Practically unlimited. Need 10× capacity? Add 10× more boxes. Need 100×? Add 100×.
Fault tolerant. One server dies — the other N-1 keep serving. Users barely notice.
Complex. Requires a load balancer. State must move out of memory. Distributed systems problems appear.
Some things resist. Databases especially — sharding a SQL database is one of the hardest things in computing.
Scale up: simpler today, ceiling tomorrow.
Scale out: harder today, no ceiling.

The right move is usually obvious in hindsight. Early-stage products scale up: it's cheaper, simpler, and you don't know yet whether your product will need anything else. Mature products scale out: traffic is high enough that no single machine could ever hold it, and you've earned the engineering investment by then. The art is knowing when to make the switch — usually before you have to.

§ 03 — Stateless wins, stateful loses

Not everything
scales equally.

Horizontal scaling has a giant hidden requirement: each request must be independent. If your code remembers things between requests — sessions in memory, locks, in-process counters — those memories live on one specific server. Send the next request to a different server, and the memory is gone. This is the difference between stateless and stateful systems, and it determines what scales easily and what doesn't.

// HOW EASILY EACH PART SCALES HORIZONTALLY

Static files// CSS, JS, images
Identical bytes on every server. Add 100 web servers, no coordination needed — each one can serve the same files. CDNs make this even easier.
TRIVIAL
Stateless app servers// REST API, business logic
Each request carries its own auth token, reads from a shared DB, returns. Servers don't need to know about each other. Designing this way is the whole prerequisite for horizontal scale.
EASY
Read-only databases// read replicas
Spawn copies, point each one at the primary, balance reads across them. Writes still go to one place, but most workloads are read-heavy, so this absorbs most growth.
MODERATE
In-memory caches// Redis, Memcached
Sharding a cache is possible (consistent hashing, etc.) but adds complexity. Each cache node holds different keys; getting one wrong means inconsistent reads.
MODERATE
Write-heavy databases// the primary write node
Sharding a SQL database — splitting one logical DB across N machines by key — is famously hard. Cross-shard queries, joins, transactions, and rebalancing all become painful.
HARD
Stateful sessions// in-memory user data
A login session held in app server memory means subsequent requests must hit the same server. Breaks load balancing. Move sessions to Redis or a token — this is the most common refactor before going horizontal.
HARD

The lesson: horizontal scaling rewards stateless design. Push all the "remembering" out of your app servers — into databases, caches, and tokens — and your app tier becomes trivial to scale. Keep it in memory and you've trapped yourself on one box. This is why every "12-factor app" guide preaches statelessness. It's also why scaling databases is a separate, harder problem we'll touch on later.

§ 04 — Scaling decision lab · interactive

Now feel the
two paths diverge.

Below: a side-by-side view of both strategies under the same traffic. Pick a load level — light to massive — and watch the vertical box swell, the horizontal grid multiply. Then read the numbers. By the time you hit 1M req/s, only one of these paths is still viable.

SCALING_DECISION.SIM // m.09 lab
TRAFFIC:
⌘ VERTICAL · SCALE UP
A bigger box
one server, more powerful
Instance typet3.small
vCPU / RAM2 / 8 GB
Monthly cost$50
Failure toleranceNone — SPOF
✓ Handles it comfortably
⌘ HORIZONTAL · SCALE OUT
More boxes
many small servers behind a load balancer
Server count2
Each (vCPU/RAM)2 / 8 GB
Monthly cost$120
Failure tolerance1 server can die
✓ Handles it · redundant
// THE VERDICT AT THIS SCALE
Light traffic — either works

At 1K req/s, both strategies are viable. Vertical is slightly cheaper (one small box vs two), but horizontal already buys you redundancy. Most teams start here with vertical scaling — it's the simpler path until you have a reason to change.

§ 05 — The cost curve

Why everyone
eventually goes horizontal.

For small traffic, vertical scaling is cheaper. One $50 box beats two $30 boxes. But the cost curve for vertical scaling is brutally non-linear: each step up the tier roughly doubles the price for slightly more performance. A 16x bigger server doesn't cost 16x more — it costs 50x or 100x more. Cloud providers know exactly how valuable that top-tier box is, and price it accordingly.

// COST PER MONTH AS TRAFFIC GROWS · APPROXIMATE

$0 $5K $15K $30K $60K monthly cost ($) 0 10K 100K 500K 1M requests / second ✗ NO BIGGER BOX EXISTS vert ≈ horiz Vertical (scale up) Horizontal (scale out)

Look at where the lines diverge. Up to about 10K req/s they're nearly identical. At 100K, vertical is starting to balloon — you're buying expensive big-iron tiers. At 500K, vertical is hitting six figures monthly. At 1M req/s, vertical literally can't get there — no single off-the-shelf machine can do it. Horizontal stays roughly linear: 2× the traffic ≈ 2× the boxes ≈ 2× the cost.

This is why every system at internet scale is horizontal. Not because vertical is wrong — for small teams it's often right. But because the wall is real, and pretending it isn't is the most expensive way to discover it. Most mature systems use both: scale up while you can, then scale out when you must. Knowing where that line is — that's the engineering judgment.

§ 06 — Eight words for the scale layer

Vocabulary,
for growing pains.

You'll see these in every capacity-planning meeting from here on. Learn them.

Vertical Scaling
/ˈvɜːtɪkəl/
Adding power to one machine — more CPU, more RAM. Also called "scaling up." Simple, no code changes, but bounded by what a single machine can be.
Horizontal Scaling
/ˌhɒrɪˈzɒntəl/
Adding more machines. Also called "scaling out." Practically unlimited, fault-tolerant, but requires stateless design and coordination.
Stateless
/ˈsteɪtləs/
A server that doesn't remember anything between requests. Each request carries everything needed. The prerequisite for horizontal scaling.
SPOF
/ˌɛs piː oʊ ˈɛf/
"Single Point of Failure." Any component whose death takes the system down. The defining weakness of vertical scaling — and the goal of horizontal is to eliminate them.
Replica
/ˈrɛplɪkə/
A copy of a server (or database) running independently. N replicas = traffic divided by N, plus redundancy. Read replicas vs write replicas behave very differently.
Sharding
/ˈʃɑːdɪŋ/
Splitting one logical dataset across many machines by some key (user_id, region, hash). The horizontal-scaling answer for databases — and one of the hardest things in computing.
Capacity Planning
/kəˈpæsɪti/
The discipline of forecasting load and provisioning ahead of it. "What if traffic 5×s next month?" Better answered before traffic actually 5×s.
Auto-scaling
/ˈɔːtəʊ skeɪlɪŋ/
Adding/removing horizontal servers automatically based on metrics (CPU, queue depth). Lets you cope with variable load without overprovisioning 24/7. Standard cloud feature.
§ 07 — Knowledge check

Five questions.
Pick your future.

Test the scaling intuition you just built. Click an answer; explanation appears immediately.

QUESTION 1 OF 5
Loading question...
Score: 0 / 5
5 / 5

Architect.

You see the two futures clearly. Next stop: load balancing — how horizontal scaling actually distributes the work.

§ 08 — The recap

Three ideas to
carry forward.

This module shapes how every scaling conversation goes for the rest of your career.

i

Up is easy, out is unlimited

Vertical scaling: simple but bounded. Horizontal scaling: complex but practically infinite. Pick based on where you are on the curve.

ii

Stateless unlocks horizontal

Move sessions, locks, and counters out of your app servers. Make every request self-contained. Then horizontal becomes trivial.

iii

Mature systems do both

Start vertical. Switch to horizontal when the price curve forces you. Databases are a separate harder problem — we'll come back to that.

↓ UP NEXT

M.10 — Load Balancing
intro.

Horizontal scaling needs a director. Something has to decide which of your N servers gets each request. Enter the load balancer — the unsung hero of every modern system. Time to look inside.

Continue to Module 10 →