Design an Image Sharing App — Instagram-Style System Design

§ 01 — A wildly different problem

Same brief.
Different system.

Last module: a URL shortener. Tiny data, gigantic read volume, cache-everything strategy. This module: an image sharing app. The product looks superficially similar — both have an upload step and a fetch step — but every interesting constraint inverts. URLs are 100 bytes; images are millions of times larger. Shortener writes are rare and reads dominate; image apps have huge write volumes (people uploading) and very different read patterns (feed scrolling, profile views). Caching helps differently. Storage costs become real. The mental model from M.17 only takes you halfway here.

// SAME TEMPLATE · TOTALLY DIFFERENT ANSWERS

Avg item size

~100 bytes (text URL)

~3 MB original (30,000× bigger)

Read/write ratio

~100:1 (heavily read-skewed)

~10:1 (more balanced; feed scrolling vs upload)

Storage system

SQL DB is fine for everything

Split: SQL for metadata, S3 for blobs

CDN

Nice optimization, optional

Mandatory. Origin egress costs would kill you

Processing

Generate code, save, done

Async pipeline: resize, optimize, transcode variants

Cost shape

Tiny storage, lots of CPU

Massive storage + egress; CPU almost free in comparison

The lesson encoded in this comparison is something senior engineers internalize early: system design is not a generic skill. The patterns you apply depend on the shape of the data. A "build me a system that does X" question is only the start; the second sentence — "and the data looks like this" — is what determines whether you reach for Redis, S3, Kafka, or Postgres. Every architecture begins with the data.

Architecture begins where the data shape ends.

Across the next four sections we'll re-run the same playbook from M.17 — requirements, estimation, decisions, lab — but applied to this very different beast. By the end you'll have a feel for when each pattern fits, and a calibrated sense of how much money each design decision actually costs.

§ 02 — Scope & estimation

Set the brief,
do the math.

The framework is the same: write functional and non-functional requirements first, then do back-of-envelope sizing. The numbers themselves will tell us where the hard problems are. Let's scope for an Instagram-like app at modest scale — 100K daily uploads, ~10× viewing ratio.

// REQUIREMENTS · IMAGE SHARING APP

// FUNCTIONAL

What it does

Upload an image. POST /images
View images at multiple sizes. GET /images/:id
Feed showing recent uploads
Optional metadata: caption, location, tags
Delete own images

// NON-FUNCTIONAL

How well

Upload latency under 2s for a typical phone photo
View latency under 200ms globally
Durability — never lose a user's images (11 nines)
Handle spiky uploads (events, viral moments)
Storage cost doesn't grow superlinearly with users

Two things to note immediately. First, durability matters more than availability here — a few seconds of outage is annoying; losing a user's wedding photos forever is unacceptable. This pushes us toward storage systems with deep replication built in (S3 stores 6 copies across 3 AZs by default). Second, upload latency is what users feel as "fast" — they tap the upload button and want to see "uploaded ✓" instantly. The actual heavy lifting (resize, optimize, transcode) can happen asynchronously after.

// SIZING THE SYSTEM · 100K UPLOADS/DAY

Daily uploads// our scale point

given · Instagram-modest

100K / day

Upload write rate// concurrent writes/sec

100K ÷ 86,400 ≈ 1.2/sec avg
peak 10× = 12/sec

~12 / sec peak

Storage per upload// original + 3 variants

thumb 50KB + medium 200KB +
full 1MB + original 3MB

~4.25 MB

Daily storage growth// raw bytes added each day

100K × 4.25MB = 425 GB / day

425 GB / day

1-year accumulated// total bytes stored

425 GB × 365 = 155 TB / year

~155 TB / year

View bandwidth// users scrolling feeds, etc.

100K × 10 views × 250KB avg
= 250 GB / day egress

~7.5 TB / month

Those numbers immediately reshape what we worry about. 155 TB/year of permanent storage means we're not putting this in MySQL — at $0.023/GB/month for cloud object storage that's only ~$3,600/year, but in a SQL DB the same data would cost 10×. 7.5 TB/month of view traffic means egress is going to be one of our biggest line items if we hit users directly from origin. CDN is no longer "nice to have"; it's mandatory. The math tells us so before we've drawn anything.

§ 03 — The metadata / blob split

SQL for the row.
S3 for the file.

The biggest architectural decision in any media-heavy system is this one: where does the binary data live, and where does the structured info about it live? The naive answer — "put it all in MySQL" — works for ten images and breaks for ten million. The right answer separates the two. Tiny structured metadata (rows, indexed, queryable) goes in your SQL database. Large binary blobs (the actual JPEG bytes) go in object storage: S3, GCS, or equivalent. The metadata row just holds the URL or key pointing to the blob.

// SAME IMAGE · TWO PLACES · ONE LINK

// SQL · METADATA ROW

Small, indexed, queryable

CREATE TABLE images (
  id          UUID PRIMARY KEY,
  user_id     UUID,
  uploaded_at TIMESTAMPTZ,
  caption     TEXT,
  width       INT,
  height      INT,
  thumb_key   VARCHAR(255),  -- "images/xyz/thumb.jpg"
  medium_key  VARCHAR(255),
  full_key    VARCHAR(255),
  original_key VARCHAR(255),
  status      VARCHAR(20)    -- 'processing', 'ready', 'failed'
);
CREATE INDEX idx_user_recent
  ON images(user_id, uploaded_at DESC);

~500 bytes per row. 100M images = 50 GB. Fits in one Postgres box comfortably. Indexed for the queries that matter: "this user's recent images," "all images uploaded in last 24h," etc.

// OBJECT STORE · BINARY DATA

Huge, opaque, dirt-cheap

# S3 bucket layout
s3://app-images/
  ├── 2024/01/15/
  │     ├── abc123-thumb.jpg    (50 KB)
  │     ├── abc123-medium.jpg   (200 KB)
  │     ├── abc123-full.jpg     (1 MB)
  │     └── abc123-original.jpg (3 MB)
  └── 2024/01/16/...

# Object stores:
# - replicate across 3+ AZs automatically
# - 11 nines of durability
# - $0.023/GB/month (way cheaper than EBS)
# - infinitely scalable, no resizing

The key in S3 is the file path. The metadata row holds these keys; the app fetches them from S3 on demand. S3 doesn't care if you have 10 objects or 10 trillion — same API, same pricing.

This split unlocks several wins. Your SQL DB stays tiny — 50 GB of metadata is trivial for Postgres; that same Postgres holding 155 TB of JPEG blobs would be a nightmare to back up, index, and replicate. Object storage scales without operational pain — you don't provision S3 capacity; you just put more objects in. Pricing is predictable — pay per GB stored and per GB transferred, no overhead for "the box we'd need to host all this."

Never put binary blobs in a relational database. Object storage exists for exactly this reason.

The same pattern applies to videos, audio files, PDFs, backups, and any other "big opaque thing." Once you internalize the metadata/blob split, you'll see it everywhere. The rule of thumb: if it's queryable structured info, SQL. If it's a chunk of bytes you only ever fetch whole, object storage. Mix them at peril.

§ 04 — Storage tier visualizer · interactive lab

Now feel the
monthly bill.

Below: an interactive cost calculator. Pick your daily upload volume, the variants you'll generate, the storage class for originals, and whether you use a CDN. The architecture diagram updates and the live cost numbers show what each decision actually costs in dollars per month. Watch what happens when you turn off the CDN. Or move originals to Glacier. Or skip the medium variant. System design is sometimes engineering and sometimes accounting.

IMAGE_STORAGE.SIM // m.18 lab

// PIPELINE & STORAGE TIERS

// daily uploads

// variants generated

Thumbnail

50 KB

Medium

200 KB

Full size

1 MB

Original (always)

3 MB

// originals storage class

// CDN for viewing

// monthly bill · steady state (year 1)

Stored

12.7TB

Egress

7.5TB

Per upload

4.25MB

Storage (S3)$293

Egress (CDN/origin)$76

SQL metadata (RDS)$45

Total / month$414

// VERDICT

All four variants · Standard storage · CDN enabled

Baseline configuration. $414/month for 100K daily uploads with full quality variants and CDN. Try: turn off the CDN. Watch egress jump from $76 to $675. Try: move originals to Glacier. Storage drops by ~70%. Each toggle has a real-money consequence — and that's the senior-engineer lens for system design.

§ 05 — Why upload returns fast

The async pipeline
that makes it feel instant.

One thing the lab visualizes but doesn't dwell on: the upload flow is asynchronous. When the user taps "upload," they need a "success ✓" within a second or two. But resizing a 3 MB photo into four variants takes seconds per image, sometimes longer for tricky formats. Doing that work synchronously would make uploads feel painfully slow. So we split the flow into two halves: the synchronous "your bytes are safe" half, and the asynchronous "we're processing it" half.

// THE TWO-PHASE UPLOAD · USER-FACING vs BACKGROUND

The pattern shows up everywhere: any time a user action triggers heavy work, you split it. The synchronous half does the minimum required for the response — store the bytes, insert the row, return success. The asynchronous half consumes the resulting event from a queue and does the rest in the background, updating the row's status as it goes (processing → ready, or failed if something blew up). The client polls for status updates or receives a push notification once ready.

This is the same async pattern as analytics events in M.17, the same pattern as sending emails after a signup, the same pattern as generating PDFs of long reports. Once you've internalized "drop a message on a queue, let workers do the slow thing," it becomes the default tool for any work that doesn't need to block the user. The user experience stays snappy; the heavy lifting still happens; nobody waits.

§ 06 — Eight words for the storage tier

Vocabulary,
for the blob life.

These show up in every architecture review involving media, files, or large objects. Lock them in.

Object Storage

/ˈɒbdʒɪkt/

A storage system designed for large, opaque files stored under keys. S3, GCS, Azure Blob. Infinitely scalable, cheap, durable. The canonical home for media, backups, logs.

Variant / Derivative

/ˈvɛəriənt/

A processed version of an original — thumbnail, medium, full-size. Pre-generated and stored separately so reads serve the smallest acceptable size for the use case.

Storage Class

/klɑːs/

A tier on object storage trading cost for access speed. Standard (hot, expensive), Infrequent Access (warmer, cheaper), Glacier (cold, very cheap but slow retrieval).

Egress

/ˈiːɡrɛs/

Data leaving the cloud (to the public internet, to users, to other regions). Often the biggest line item in a cloud bill. Why CDNs exist.

Presigned URL

/priːˈsaɪnd/

A time-limited URL with embedded credentials, allowing direct client-to-S3 upload. Avoids routing huge files through your app servers.

Durability

/ˌdjʊərəˈbɪlɪti/

The probability that stored data is preserved over time. S3 advertises "11 nines" — 99.999999999%. Different from availability (the probability of being readable right now).

Async Worker

/ˈeɪsɪŋk/

A background process consuming a queue of jobs. The thing that does slow work (resize images, send emails, compute reports) without blocking the user-facing path.

Lifecycle Policy

/ˈlaɪfˌsaɪkl/

A rule on object storage that automatically moves data between tiers based on age or access pattern. Move originals to Glacier after 90 days, delete after 7 years, etc.

§ 07 — Knowledge check

Five questions.
Watch the bill.

Test the media-system intuition. Click an answer; explanation drops in instantly.

QUESTION 1 OF 5

Loading question...

Score: 0 / 5

5 / 5

Stored.

Perfect. You can think through media-heavy systems and feel the dollar weight of each decision. Next: reading the diagrams others have drawn — the C4 model.

§ 08 — The recap

Three ideas to
carry forward.

The media-heavy system patterns are reusable everywhere blobs show up.

i

Split metadata from blobs

Tiny structured rows in SQL. Big opaque bytes in object storage. The row holds the key; the blob lives wherever. Never mix them.

ii

Variants beat raw originals

Pre-generate sizes for each use case. Serve the smallest one a viewer needs. Saves egress, saves CDN budget, saves user bandwidth.

iii

Sync the minimum, async the rest

User waits only for "your bytes are safe." Resize, optimize, transcode all happen on a queue + worker pipeline. The user moved on.

Build an image
sharing app.

// What you'll know by the end

Same brief.
Different system.

Set the brief,
do the math.

// REQUIREMENTS · IMAGE SHARING APP

What it does

How well

// SIZING THE SYSTEM · 100K UPLOADS/DAY

SQL for the row.
S3 for the file.

// SAME IMAGE · TWO PLACES · ONE LINK

Small, indexed, queryable

Huge, opaque, dirt-cheap

Now feel the
monthly bill.

// PIPELINE & STORAGE TIERS

// daily uploads

// variants generated

// originals storage class

// CDN for viewing

// monthly bill · steady state (year 1)

All four variants · Standard storage · CDN enabled

The async pipeline
that makes it feel instant.

// THE TWO-PHASE UPLOAD · USER-FACING vs BACKGROUND

Vocabulary,
for the blob life.

Five questions.
Watch the bill.

Stored.

Three ideas to
carry forward.

Split metadata from blobs

Variants beat raw originals

Sync the minimum, async the rest

M.19 — Reading
architecture diagrams.

Same brief.Different system.

Set the brief,do the math.

// REQUIREMENTS · IMAGE SHARING APP

What it does

How well

// SIZING THE SYSTEM · 100K UPLOADS/DAY

SQL for the row.S3 for the file.

// SAME IMAGE · TWO PLACES · ONE LINK

Small, indexed, queryable

Huge, opaque, dirt-cheap

Now feel themonthly bill.

// PIPELINE & STORAGE TIERS

// daily uploads

// variants generated

// originals storage class

// CDN for viewing

// monthly bill · steady state (year 1)

All four variants · Standard storage · CDN enabled

The async pipelinethat makes it feel instant.

// THE TWO-PHASE UPLOAD · USER-FACING vs BACKGROUND

Vocabulary,for the blob life.

Five questions.Watch the bill.

Stored.

Three ideas tocarry forward.

Split metadata from blobs

Variants beat raw originals

Sync the minimum, async the rest

M.19 — Readingarchitecture diagrams.

Same brief.
Different system.

Set the brief,
do the math.

SQL for the row.
S3 for the file.

Now feel the
monthly bill.

The async pipeline
that makes it feel instant.

Vocabulary,
for the blob life.

Five questions.
Watch the bill.

Three ideas to
carry forward.

M.19 — Reading
architecture diagrams.