Before any request reaches your load balancer, something has to turn google.com into 142.250.80.46. The system that does it is older than the web, runs everywhere, and is mostly invisible — until it isn't.
You type google.com into your browser. But routers, the actual machines that move packets, have no idea what google.com means. They route by IP address — a 32-bit number like 142.250.80.46. Before a single byte of your request can travel, something has to translate the name to the number. That something is DNS — the Domain Name System. It is the phonebook of the internet, and it answers billions of these queries every second without you ever noticing.
Without DNS, you'd be memorizing IP addresses like phone numbers, except harder. Worse: when Google changes server (which happens constantly, hourly), every address book in the world would need updating. DNS solves both problems — it gives every machine a human-readable name, lets the underlying IP change freely, and propagates updates worldwide in minutes. That's why it's the foundation under every other system we build.
Here's the genius bit: nobody has a complete copy of the phonebook. There are billions of domains; no single server could hold them all, and even if one could, it would be a catastrophic single point of failure. Instead, DNS is structured as a hierarchy — each layer knows only enough to point you to the next. To find systemdesigntutorial.com, your computer asks four different servers, each one progressively more specific.
The flow is always the same. Root servers (there are 13 logical ones, replicated worldwide via Anycast) know only one thing: which server is in charge of each top-level domain. TLD servers (.com, .dev, .org, etc.) know only one thing: which nameserver each domain registered. Authoritative servers (operated by whoever owns the domain, often through Cloudflare, Route 53, or similar) hold the actual records. Each layer is small, fast, and replicated. The system has no single point of failure.
One more thing: you don't ask all those servers yourself. There's a fourth player called the recursive resolver — usually your ISP's DNS server, or a public one like 8.8.8.8 (Google) or 1.1.1.1 (Cloudflare). You ask it once, and it does the four-step climb on your behalf, caches the result, and hands it back to you. That's the part the next lab walks through, step by step.
The phonebook metaphor undersells DNS a little. It actually stores several kinds of records about a domain — IP addresses, yes, but also email server hints, security tokens, verification proofs. Each kind is called a record type, and you'll meet most of these in your first month of dealing with any production system.
www. CNAMEs to a Cloudflare hostname which then resolves to an A record.you@yoursite.com reach the right place.There are more (PTR for reverse lookups, SRV for service discovery, CAA for SSL authority hints, SOA for admin info), but those six handle 95% of what you'll actually encounter. The mental model: a DNS record is a (name, type, value) triple, and querying DNS means asking "what's the X record for Y?" — where X is one of these types.
Below: a working DNS resolution. Pick a domain, hit Resolve, and watch the query travel through every cache and every server. Then resolve it again — the second time, almost everything is cached. Hit Clear caches to start over.
A cold first resolution makes 4 network queries: client → recursive → root → TLD → authoritative. Each adds latency. But every layer caches the result, so the second lookup is dramatically faster — usually a single browser-cache hit. This caching is why DNS feels invisible most of the time.
Every DNS record carries a TTL — Time To Live — which tells every cache between you and the authoritative server how long it's allowed to hold the answer. After the TTL expires, the next request triggers a fresh lookup. This is the dial that balances speed (long TTL means more cache hits) against flexibility (short TTL means changes propagate fast).
3600 seconds is one of the most common values in production.The pragmatic workflow: set TTL to your normal default (1 hour, say) for stable records. When you know a change is coming, lower it to 5 minutes a day in advance. Make the change. Watch traffic shift. Then raise the TTL back up. This is one of the small operational rituals that distinguishes a calm migration from a 3-hour outage.
One last subtlety worth knowing: browsers and OSes don't always honor TTLs exactly. Chrome caches DNS for ~60 seconds regardless of what the record says; some Windows versions hold for much longer. The TTL is a hint, not a contract. Plan for some clients to take 2× the TTL to update.
You'll see these in every postmortem about a DNS outage — which is to say, in every postmortem. Get fluent.
google.com) to machine addresses (142.250.80.46). Older than the web, foundational to everything.8.8.8.8, 1.1.1.1.Locking in the DNS intuition. Click an answer; the explanation arrives instantly.
The phonebook is no longer mysterious. Next stop: CDNs — how the internet uses DNS to deliver content from servers near you.
DNS is the invisible substrate. Knowing how it works prevents 80% of "production is broken and nobody knows why" incidents.
Root → TLD → Authoritative. Each layer holds little, points to the next. No single point of failure, scales to billions of domains.
Browser, OS, recursive resolver — all cache results. The first lookup is slow; every subsequent one is near-instant. This is what makes DNS feel fast.
Long TTL = fast and stable. Short TTL = nimble but loaded. Most records are 1 hour by default; lower before planned changes.