Horizontal vs Vertical Scaling — Sharding and Replication

§ 01 — The moment of truth

Remember the
switchboard?

Back in Module 02, you cranked up the clients on a single server and watched the queue overflow. That was the universe telling you something: every single machine has a ceiling. CPU runs out. Memory fills up. Network port saturates. Disk I/O bottlenecks. Once you hit any one of these, your perfectly designed system stops responding. The question isn't whether you'll hit that wall — it's what you do when you do.

// ONE SERVER · TWO POSSIBLE FUTURES

// 9:00 AM

Tuesday morning

Traffic 2,000 req/s

CPU 45%

p99 latency 120 ms

Status healthy

// 9:00 PM · launch

Tuesday night

Traffic 80,000 req/s

CPU 100% pinned

p99 latency ∞ (timeouts)

Status on fire

The fundamental insight is this: computers don't grow gradually. They saturate and break. A server at 80% CPU runs fine. A server at 99% CPU is on the edge. A server at 100% is unreachable. There's no smooth degradation. So when traffic grows — and it always does — you need a plan. Two plans exist. They lead to different futures.

§ 02 — The two answers

Scale up.
Or scale out.

There are exactly two ways to handle more load. You either get a bigger machine (scale up, also called vertical scaling) — more CPU, more RAM, faster disk on the same server. Or you get more machines (scale out, also called horizontal scaling) — distribute work across many smaller servers. They sound like a minor distinction. They're not. They lead to two completely different architectures.

// SCALING UP vs SCALING OUT · THE METAPHOR

// VERTICAL · scale up

A taller tower

one machine, more powerful

Simple. Your code doesn't change. The OS sees more CPU, more RAM. Most apps just go faster automatically.

No coordination needed. Still one machine — no distributed systems problems to solve.

Has a ceiling. Even AWS's biggest box has a fixed top-end. You can't scale beyond what one machine can be.

Single point of failure. One machine = one place to fail. When it does, everyone is down.

// HORIZONTAL · scale out

More towers

many machines, distributed work

Practically unlimited. Need 10× capacity? Add 10× more boxes. Need 100×? Add 100×.

Fault tolerant. One server dies — the other N-1 keep serving. Users barely notice.

Complex. Requires a load balancer. State must move out of memory. Distributed systems problems appear.

Some things resist. Databases especially — sharding a SQL database is one of the hardest things in computing.

Scale up: simpler today, ceiling tomorrow.
Scale out: harder today, no ceiling.

The right move is usually obvious in hindsight. Early-stage products scale up: it's cheaper, simpler, and you don't know yet whether your product will need anything else. Mature products scale out: traffic is high enough that no single machine could ever hold it, and you've earned the engineering investment by then. The art is knowing when to make the switch — usually before you have to.

§ 03 — Stateless wins, stateful loses

Not everything
scales equally.

Horizontal scaling has a giant hidden requirement: each request must be independent. If your code remembers things between requests — sessions in memory, locks, in-process counters — those memories live on one specific server. Send the next request to a different server, and the memory is gone. This is the difference between stateless and stateful systems, and it determines what scales easily and what doesn't.

// HOW EASILY EACH PART SCALES HORIZONTALLY

Static files// CSS, JS, images

Identical bytes on every server. Add 100 web servers, no coordination needed — each one can serve the same files. CDNs make this even easier.

TRIVIAL

Stateless app servers// REST API, business logic

Each request carries its own auth token, reads from a shared DB, returns. Servers don't need to know about each other. Designing this way is the whole prerequisite for horizontal scale.

EASY

Read-only databases// read replicas

Spawn copies, point each one at the primary, balance reads across them. Writes still go to one place, but most workloads are read-heavy, so this absorbs most growth.

MODERATE

In-memory caches// Redis, Memcached

Sharding a cache is possible (consistent hashing, etc.) but adds complexity. Each cache node holds different keys; getting one wrong means inconsistent reads.

MODERATE

Write-heavy databases// the primary write node

Sharding a SQL database — splitting one logical DB across N machines by key — is famously hard. Cross-shard queries, joins, transactions, and rebalancing all become painful.

HARD

Stateful sessions// in-memory user data

A login session held in app server memory means subsequent requests must hit the same server. Breaks load balancing. Move sessions to Redis or a token — this is the most common refactor before going horizontal.

HARD

The lesson: horizontal scaling rewards stateless design. Push all the "remembering" out of your app servers — into databases, caches, and tokens — and your app tier becomes trivial to scale. Keep it in memory and you've trapped yourself on one box. This is why every "12-factor app" guide preaches statelessness. It's also why scaling databases is a separate, harder problem we'll touch on later.

§ 04 — Scaling decision lab · interactive

Now feel the
two paths diverge.

Below: a side-by-side view of both strategies under the same traffic. Pick a load level — light to massive — and watch the vertical box swell, the horizontal grid multiply. Then read the numbers. By the time you hit 1M req/s, only one of these paths is still viable.

SCALING_DECISION.SIM // m.09 lab

TRAFFIC:

⌘ VERTICAL · SCALE UP

A bigger box

one server, more powerful

Instance typet3.small

vCPU / RAM2 / 8 GB

Monthly cost$50

Failure toleranceNone — SPOF

✓ Handles it comfortably

⌘ HORIZONTAL · SCALE OUT

More boxes

many small servers behind a load balancer

Server count2

Each (vCPU/RAM)2 / 8 GB

Monthly cost$120

Failure tolerance1 server can die

✓ Handles it · redundant

// THE VERDICT AT THIS SCALE

Light traffic — either works

At 1K req/s, both strategies are viable. Vertical is slightly cheaper (one small box vs two), but horizontal already buys you redundancy. Most teams start here with vertical scaling — it's the simpler path until you have a reason to change.

§ 05 — The cost curve

Why everyone
eventually goes horizontal.

For small traffic, vertical scaling is cheaper. One $50 box beats two $30 boxes. But the cost curve for vertical scaling is brutally non-linear: each step up the tier roughly doubles the price for slightly more performance. A 16x bigger server doesn't cost 16x more — it costs 50x or 100x more. Cloud providers know exactly how valuable that top-tier box is, and price it accordingly.

// COST PER MONTH AS TRAFFIC GROWS · APPROXIMATE

Look at where the lines diverge. Up to about 10K req/s they're nearly identical. At 100K, vertical is starting to balloon — you're buying expensive big-iron tiers. At 500K, vertical is hitting six figures monthly. At 1M req/s, vertical literally can't get there — no single off-the-shelf machine can do it. Horizontal stays roughly linear: 2× the traffic ≈ 2× the boxes ≈ 2× the cost.

This is why every system at internet scale is horizontal. Not because vertical is wrong — for small teams it's often right. But because the wall is real, and pretending it isn't is the most expensive way to discover it. Most mature systems use both: scale up while you can, then scale out when you must. Knowing where that line is — that's the engineering judgment.

§ 06 — Eight words for the scale layer

Vocabulary,
for growing pains.

You'll see these in every capacity-planning meeting from here on. Learn them.

Vertical Scaling

/ˈvɜːtɪkəl/

Adding power to one machine — more CPU, more RAM. Also called "scaling up." Simple, no code changes, but bounded by what a single machine can be.

Horizontal Scaling

/ˌhɒrɪˈzɒntəl/

Adding more machines. Also called "scaling out." Practically unlimited, fault-tolerant, but requires stateless design and coordination.

Stateless

/ˈsteɪtləs/

A server that doesn't remember anything between requests. Each request carries everything needed. The prerequisite for horizontal scaling.

SPOF

/ˌɛs piː oʊ ˈɛf/

"Single Point of Failure." Any component whose death takes the system down. The defining weakness of vertical scaling — and the goal of horizontal is to eliminate them.

Replica

/ˈrɛplɪkə/

A copy of a server (or database) running independently. N replicas = traffic divided by N, plus redundancy. Read replicas vs write replicas behave very differently.

Sharding

/ˈʃɑːdɪŋ/

Splitting one logical dataset across many machines by some key (user_id, region, hash). The horizontal-scaling answer for databases — and one of the hardest things in computing.

Capacity Planning

/kəˈpæsɪti/

The discipline of forecasting load and provisioning ahead of it. "What if traffic 5×s next month?" Better answered before traffic actually 5×s.

Auto-scaling

/ˈɔːtəʊ skeɪlɪŋ/

Adding/removing horizontal servers automatically based on metrics (CPU, queue depth). Lets you cope with variable load without overprovisioning 24/7. Standard cloud feature.

§ 07 — Knowledge check

Five questions.
Pick your future.

Test the scaling intuition you just built. Click an answer; explanation appears immediately.

QUESTION 1 OF 5

Loading question...

Score: 0 / 5

5 / 5

Architect.

You see the two futures clearly. Next stop: load balancing — how horizontal scaling actually distributes the work.

§ 08 — The recap

Three ideas to
carry forward.

This module shapes how every scaling conversation goes for the rest of your career.

i

Up is easy, out is unlimited

Vertical scaling: simple but bounded. Horizontal scaling: complex but practically infinite. Pick based on where you are on the curve.

ii

Stateless unlocks horizontal

Move sessions, locks, and counters out of your app servers. Make every request self-contained. Then horizontal becomes trivial.

iii

Mature systems do both

Start vertical. Switch to horizontal when the price curve forces you. Databases are a separate harder problem — we'll come back to that.

Vertical vs
horizontal scaling.

// What you'll know by the end

Remember the
switchboard?

Tuesday morning

Tuesday night

Scale up.
Or scale out.

// SCALING UP vs SCALING OUT · THE METAPHOR

A taller tower

More towers

Not everything
scales equally.

// HOW EASILY EACH PART SCALES HORIZONTALLY

Now feel the
two paths diverge.

A bigger box

More boxes

Light traffic — either works

Why everyone
eventually goes horizontal.

// COST PER MONTH AS TRAFFIC GROWS · APPROXIMATE

Vocabulary,
for growing pains.

Five questions.
Pick your future.

Architect.

Three ideas to
carry forward.

Up is easy, out is unlimited

Stateless unlocks horizontal

Mature systems do both

M.10 — Load Balancing
intro.

Remember theswitchboard?

Tuesday morning

Tuesday night

Scale up.Or scale out.

// SCALING UP vs SCALING OUT · THE METAPHOR

A taller tower

More towers

Not everythingscales equally.

// HOW EASILY EACH PART SCALES HORIZONTALLY

Now feel thetwo paths diverge.

A bigger box

More boxes

Light traffic — either works

Why everyoneeventually goes horizontal.

// COST PER MONTH AS TRAFFIC GROWS · APPROXIMATE

Vocabulary,for growing pains.

Five questions.Pick your future.

Architect.

Three ideas tocarry forward.

Up is easy, out is unlimited

Stateless unlocks horizontal

Mature systems do both

M.10 — Load Balancingintro.

Remember the
switchboard?

Scale up.
Or scale out.

Not everything
scales equally.

Now feel the
two paths diverge.

Why everyone
eventually goes horizontal.

Vocabulary,
for growing pains.

Five questions.
Pick your future.

Three ideas to
carry forward.

M.10 — Load Balancing
intro.