Module 18 / 20 · Phase D — Build & Interview · 45 min

Build an image
sharing app.

Same exercise as last module, completely different constraints. Now the data isn't tiny strings — it's megabytes per row. The read/write ratio flips. The bill is real money. And the architecture decisions you make in the first 10 minutes determine whether your cloud bill is hundreds or hundreds of thousands.

// What you'll know by the end

  • Why object storage exists and when to use it
  • How to separate metadata from binary blobs
  • Why async processing is mandatory for media
  • The dollar weight of each architecture choice
§ 01 — A wildly different problem

Same brief.
Different system.

Last module: a URL shortener. Tiny data, gigantic read volume, cache-everything strategy. This module: an image sharing app. The product looks superficially similar — both have an upload step and a fetch step — but every interesting constraint inverts. URLs are 100 bytes; images are millions of times larger. Shortener writes are rare and reads dominate; image apps have huge write volumes (people uploading) and very different read patterns (feed scrolling, profile views). Caching helps differently. Storage costs become real. The mental model from M.17 only takes you halfway here.

// SAME TEMPLATE · TOTALLY DIFFERENT ANSWERS
DIMENSION
URL SHORTENER (M.17)
IMAGE SHARING APP (M.18)
Avg item size
~100 bytes (text URL)
~3 MB original (30,000× bigger)
Read/write ratio
~100:1 (heavily read-skewed)
~10:1 (more balanced; feed scrolling vs upload)
Storage system
SQL DB is fine for everything
Split: SQL for metadata, S3 for blobs
CDN
Nice optimization, optional
Mandatory. Origin egress costs would kill you
Processing
Generate code, save, done
Async pipeline: resize, optimize, transcode variants
Cost shape
Tiny storage, lots of CPU
Massive storage + egress; CPU almost free in comparison

The lesson encoded in this comparison is something senior engineers internalize early: system design is not a generic skill. The patterns you apply depend on the shape of the data. A "build me a system that does X" question is only the start; the second sentence — "and the data looks like this" — is what determines whether you reach for Redis, S3, Kafka, or Postgres. Every architecture begins with the data.

Architecture begins where the data shape ends.

Across the next four sections we'll re-run the same playbook from M.17 — requirements, estimation, decisions, lab — but applied to this very different beast. By the end you'll have a feel for when each pattern fits, and a calibrated sense of how much money each design decision actually costs.

§ 02 — Scope & estimation

Set the brief,
do the math.

The framework is the same: write functional and non-functional requirements first, then do back-of-envelope sizing. The numbers themselves will tell us where the hard problems are. Let's scope for an Instagram-like app at modest scale — 100K daily uploads, ~10× viewing ratio.

// REQUIREMENTS · IMAGE SHARING APP

// FUNCTIONAL
What it does
  • Upload an image. POST /images
  • View images at multiple sizes. GET /images/:id
  • Feed showing recent uploads
  • Optional metadata: caption, location, tags
  • Delete own images
// NON-FUNCTIONAL
How well
  • Upload latency under 2s for a typical phone photo
  • View latency under 200ms globally
  • Durability — never lose a user's images (11 nines)
  • Handle spiky uploads (events, viral moments)
  • Storage cost doesn't grow superlinearly with users

Two things to note immediately. First, durability matters more than availability here — a few seconds of outage is annoying; losing a user's wedding photos forever is unacceptable. This pushes us toward storage systems with deep replication built in (S3 stores 6 copies across 3 AZs by default). Second, upload latency is what users feel as "fast" — they tap the upload button and want to see "uploaded ✓" instantly. The actual heavy lifting (resize, optimize, transcode) can happen asynchronously after.

// SIZING THE SYSTEM · 100K UPLOADS/DAY

Daily uploads// our scale point
given · Instagram-modest
100K / day
Upload write rate// concurrent writes/sec
100K ÷ 86,400 ≈ 1.2/sec avg
peak 10× = 12/sec
~12 / sec peak
Storage per upload// original + 3 variants
thumb 50KB + medium 200KB +
full 1MB + original 3MB
~4.25 MB
Daily storage growth// raw bytes added each day
100K × 4.25MB = 425 GB / day
425 GB / day
1-year accumulated// total bytes stored
425 GB × 365 = 155 TB / year
~155 TB / year
View bandwidth// users scrolling feeds, etc.
100K × 10 views × 250KB avg
= 250 GB / day egress
~7.5 TB / month

Those numbers immediately reshape what we worry about. 155 TB/year of permanent storage means we're not putting this in MySQL — at $0.023/GB/month for cloud object storage that's only ~$3,600/year, but in a SQL DB the same data would cost 10×. 7.5 TB/month of view traffic means egress is going to be one of our biggest line items if we hit users directly from origin. CDN is no longer "nice to have"; it's mandatory. The math tells us so before we've drawn anything.

§ 03 — The metadata / blob split

SQL for the row.
S3 for the file.

The biggest architectural decision in any media-heavy system is this one: where does the binary data live, and where does the structured info about it live? The naive answer — "put it all in MySQL" — works for ten images and breaks for ten million. The right answer separates the two. Tiny structured metadata (rows, indexed, queryable) goes in your SQL database. Large binary blobs (the actual JPEG bytes) go in object storage: S3, GCS, or equivalent. The metadata row just holds the URL or key pointing to the blob.

// SAME IMAGE · TWO PLACES · ONE LINK

// SQL · METADATA ROW
Small, indexed, queryable
CREATE TABLE images (
  id          UUID PRIMARY KEY,
  user_id     UUID,
  uploaded_at TIMESTAMPTZ,
  caption     TEXT,
  width       INT,
  height      INT,
  thumb_key   VARCHAR(255),  -- "images/xyz/thumb.jpg"
  medium_key  VARCHAR(255),
  full_key    VARCHAR(255),
  original_key VARCHAR(255),
  status      VARCHAR(20)    -- 'processing', 'ready', 'failed'
);
CREATE INDEX idx_user_recent
  ON images(user_id, uploaded_at DESC);

~500 bytes per row. 100M images = 50 GB. Fits in one Postgres box comfortably. Indexed for the queries that matter: "this user's recent images," "all images uploaded in last 24h," etc.

// OBJECT STORE · BINARY DATA
Huge, opaque, dirt-cheap
# S3 bucket layout
s3://app-images/
  ├── 2024/01/15/
  │     ├── abc123-thumb.jpg    (50 KB)
  │     ├── abc123-medium.jpg   (200 KB)
  │     ├── abc123-full.jpg     (1 MB)
  │     └── abc123-original.jpg (3 MB)
  └── 2024/01/16/...

# Object stores:
# - replicate across 3+ AZs automatically
# - 11 nines of durability
# - $0.023/GB/month (way cheaper than EBS)
# - infinitely scalable, no resizing

The key in S3 is the file path. The metadata row holds these keys; the app fetches them from S3 on demand. S3 doesn't care if you have 10 objects or 10 trillion — same API, same pricing.

This split unlocks several wins. Your SQL DB stays tiny — 50 GB of metadata is trivial for Postgres; that same Postgres holding 155 TB of JPEG blobs would be a nightmare to back up, index, and replicate. Object storage scales without operational pain — you don't provision S3 capacity; you just put more objects in. Pricing is predictable — pay per GB stored and per GB transferred, no overhead for "the box we'd need to host all this."

Never put binary blobs in a relational database. Object storage exists for exactly this reason.

The same pattern applies to videos, audio files, PDFs, backups, and any other "big opaque thing." Once you internalize the metadata/blob split, you'll see it everywhere. The rule of thumb: if it's queryable structured info, SQL. If it's a chunk of bytes you only ever fetch whole, object storage. Mix them at peril.

§ 04 — Storage tier visualizer · interactive lab

Now feel the
monthly bill.

Below: an interactive cost calculator. Pick your daily upload volume, the variants you'll generate, the storage class for originals, and whether you use a CDN. The architecture diagram updates and the live cost numbers show what each decision actually costs in dollars per month. Watch what happens when you turn off the CDN. Or move originals to Glacier. Or skip the medium variant. System design is sometimes engineering and sometimes accounting.

IMAGE_STORAGE.SIM // m.18 lab
// PIPELINE & STORAGE TIERS
📷 uploader UPLOAD service QUEUE kafka WORKER resize · optimize ↓ S3 OBJECT STORE THUMB 50 KB · Standard MEDIUM 200 KB · Standard FULL 1 MB · Standard ORIGINAL 3 MB · Standard CDN · GLOBAL EDGES 95% hit rate · low egress cost 👀 viewer · 10 views per upload SQL metadata
// daily uploads
// variants generated
Thumbnail
50 KB
Medium
200 KB
Full size
1 MB
Original (always)
3 MB
// originals storage class
// CDN for viewing
// monthly bill · steady state (year 1)
Stored
12.7TB
Egress
7.5TB
Per upload
4.25MB
Storage (S3)$293
Egress (CDN/origin)$76
SQL metadata (RDS)$45
Total / month$414
// VERDICT
All four variants · Standard storage · CDN enabled

Baseline configuration. $414/month for 100K daily uploads with full quality variants and CDN. Try: turn off the CDN. Watch egress jump from $76 to $675. Try: move originals to Glacier. Storage drops by ~70%. Each toggle has a real-money consequence — and that's the senior-engineer lens for system design.

§ 05 — Why upload returns fast

The async pipeline
that makes it feel instant.

One thing the lab visualizes but doesn't dwell on: the upload flow is asynchronous. When the user taps "upload," they need a "success ✓" within a second or two. But resizing a 3 MB photo into four variants takes seconds per image, sometimes longer for tricky formats. Doing that work synchronously would make uploads feel painfully slow. So we split the flow into two halves: the synchronous "your bytes are safe" half, and the asynchronous "we're processing it" half.

// THE TWO-PHASE UPLOAD · USER-FACING vs BACKGROUND

// SYNCHRONOUS · < 2 SECONDS // ASYNCHRONOUS · SECONDS-MINUTES 📷 user UPLOAD service S3 original only ↑ user gets 201 Created "upload successful · processing..." publish QUEUE kafka WORKERS resize · optimize S3 all variants SQL status=ready ↑ happens after the user moves on image appears in feed when ready User waits only for: bytes uploaded + DB row inserted. Everything else is fire-and-forget. // total perceived latency: ~1.5s · actual variant generation: 5-30s in background

The pattern shows up everywhere: any time a user action triggers heavy work, you split it. The synchronous half does the minimum required for the response — store the bytes, insert the row, return success. The asynchronous half consumes the resulting event from a queue and does the rest in the background, updating the row's status as it goes (processing → ready, or failed if something blew up). The client polls for status updates or receives a push notification once ready.

This is the same async pattern as analytics events in M.17, the same pattern as sending emails after a signup, the same pattern as generating PDFs of long reports. Once you've internalized "drop a message on a queue, let workers do the slow thing," it becomes the default tool for any work that doesn't need to block the user. The user experience stays snappy; the heavy lifting still happens; nobody waits.

§ 06 — Eight words for the storage tier

Vocabulary,
for the blob life.

These show up in every architecture review involving media, files, or large objects. Lock them in.

Object Storage
/ˈɒbdʒɪkt/
A storage system designed for large, opaque files stored under keys. S3, GCS, Azure Blob. Infinitely scalable, cheap, durable. The canonical home for media, backups, logs.
Variant / Derivative
/ˈvɛəriənt/
A processed version of an original — thumbnail, medium, full-size. Pre-generated and stored separately so reads serve the smallest acceptable size for the use case.
Storage Class
/klɑːs/
A tier on object storage trading cost for access speed. Standard (hot, expensive), Infrequent Access (warmer, cheaper), Glacier (cold, very cheap but slow retrieval).
Egress
/ˈiːɡrɛs/
Data leaving the cloud (to the public internet, to users, to other regions). Often the biggest line item in a cloud bill. Why CDNs exist.
Presigned URL
/priːˈsaɪnd/
A time-limited URL with embedded credentials, allowing direct client-to-S3 upload. Avoids routing huge files through your app servers.
Durability
/ˌdjʊərəˈbɪlɪti/
The probability that stored data is preserved over time. S3 advertises "11 nines" — 99.999999999%. Different from availability (the probability of being readable right now).
Async Worker
/ˈeɪsɪŋk/
A background process consuming a queue of jobs. The thing that does slow work (resize images, send emails, compute reports) without blocking the user-facing path.
Lifecycle Policy
/ˈlaɪfˌsaɪkl/
A rule on object storage that automatically moves data between tiers based on age or access pattern. Move originals to Glacier after 90 days, delete after 7 years, etc.
§ 07 — Knowledge check

Five questions.
Watch the bill.

Test the media-system intuition. Click an answer; explanation drops in instantly.

QUESTION 1 OF 5
Loading question...
Score: 0 / 5
5 / 5

Stored.

Perfect. You can think through media-heavy systems and feel the dollar weight of each decision. Next: reading the diagrams others have drawn — the C4 model.

§ 08 — The recap

Three ideas to
carry forward.

The media-heavy system patterns are reusable everywhere blobs show up.

i

Split metadata from blobs

Tiny structured rows in SQL. Big opaque bytes in object storage. The row holds the key; the blob lives wherever. Never mix them.

ii

Variants beat raw originals

Pre-generate sizes for each use case. Serve the smallest one a viewer needs. Saves egress, saves CDN budget, saves user bandwidth.

iii

Sync the minimum, async the rest

User waits only for "your bytes are safe." Resize, optimize, transcode all happen on a queue + worker pipeline. The user moved on.

↓ UP NEXT

M.19 — Reading
architecture diagrams.

You can now design systems. But most of your career, you'll read systems others designed. Time to learn the canonical visual language — the C4 model — so you can parse any architecture diagram in 60 seconds and contribute meaningfully to design conversations.

Continue to Module 19 →