flowchart LR
A["<b>DHCP</b><br/>Get IP address,<br/>router IP, DNS IP<br/><i>→ broadcast on LAN</i>"]
B["<b>ARP</b><br/>Resolve router's<br/>MAC address<br/><i>→ broadcast query</i>"]
C["<b>DNS</b><br/>Resolve hostname<br/>to IP address<br/><i>→ hierarchical lookup</i>"]
D["<b>TCP</b><br/>Establish reliable<br/>connection<br/><i>→ 3-way handshake</i>"]
E["<b>HTTP</b><br/>Fetch web page<br/>content<br/><i>→ GET request</i>"]
A --> B --> C --> D --> E
style A fill:#4477AA,color:#fff,stroke:#4477AA
style B fill:#66CCEE,color:#000,stroke:#66CCEE
style C fill:#228833,color:#fff,stroke:#228833
style D fill:#CCBB44,color:#000,stroke:#CCBB44
style E fill:#EE6677,color:#fff,stroke:#EE6677
2 First Principles for Networked Systems
2.1 The Question Behind Every System
In an introductory networking course, you learn how individual protocols work: TCP delivers bytes reliably, DNS resolves names, HTTP fetches web pages. Each protocol has its own rules, its own header format, its own behavior. What you may not yet see is that every one of these protocols is a networked system — a collection of components that must make design choices about the same four questions:
- What information should the system maintain? (State)
- When do things happen? (Time)
- Who makes decisions? (Coordination)
- How do components talk to each other? (Interface)
These four questions define the design space of networked systems. They are structural invariants: irreducible questions that every system must answer, regardless of whether it operates at the physical layer, the transport layer, or the application layer. A WiFi access point answers them. TCP answers them. A video streaming client answers them. A broadband measurement system answers them. The answers differ radically — but the questions do not.
You can study each invariant on its own — asking “what state does TCP maintain?” is a coherent question independent of “who coordinates?” But the answers interact. TCP’s choice of distributed coordination (no central authority) forces it to build state from local observations only. Its interface inheritance from IP (unreliable datagrams) rules out centralized scheduling. The four invariants are separable for analysis but coupled in practice: the answer to one shapes what answers are feasible for the others.
This chapter introduces the analytical framework that organizes the rest of the book. The framework has three components:
- Four invariants that define what must be answered: State, Time, Coordination, Interface.
- Three design principles that describe recurring strategies for constructing good answers under constraints: Disaggregation, Closed-loop reasoning, Decision placement.
- A method — the anchored dependency graph — that traces how one design choice constrains the rest.
The framework explains failures and generates design alternatives. Together, these components give you a vocabulary for describing any networked system, reasoning tools for predicting system behavior, and a method for producing new designs when environmental constraints shift.
Three intellectual traditions underpin this framework. Wiener’s cybernetic program [Wiener, 1948] identified the feedback loop as the organizing principle of adaptive systems — the foundation of our Closed-loop reasoning principle. Saltzer, Reed, and Clark’s end-to-end arguments [Saltzer et al., 1984] formalized where functionality belongs in layered architectures — the question our Interface invariant and Decision placement principle address. Shannon’s information theory [Shannon, 1948] established fundamental limits on what any observer can learn about a noisy channel — the constraint that shapes every system’s State invariant, because measurement signals are always partial and delayed. This book’s contribution is the specific four-invariant decomposition, the anchored dependency graph as an analytical tool, and the separation of invariants from principles as a pedagogical structure for teaching networked systems.
2.2 A Motivating Example: What Happens When You Load a Web Page
You open your laptop, connect to WiFi, and type www.google.com into a browser. Before the page renders, five protocols must fire in sequence: DHCP, ARP, DNS, TCP, and HTTP. Each protocol solves a different problem, uses a different strategy, and makes different structural choices. Figure 2.1 shows the network scenario — a laptop on a local network, a WiFi router that serves as the first hop, and servers reachable through the Internet. This scenario — familiar from an introductory networking course — is the framework’s first test case. Walk through it concretely, then ask: why do these five protocols look so different from each other?
The laptop starts with nothing. It has no IP address, no router address, no DNS server address. It cannot send a targeted request because it does not know whom to ask. DHCP (Dynamic Host Configuration Protocol) resolves this bootstrap problem. The client constructs a DHCP Discover message and broadcasts it on the LAN using destination MAC address FF:FF:FF:FF:FF:FF — the Ethernet broadcast address that every device on the local network receives. The encapsulation stack is DHCP → UDP → IP → Ethernet. A router on the LAN runs the DHCP server, receives the broadcast, and replies with three critical pieces of information: the client’s assigned IP address, the first-hop router’s IP address, and the DNS server’s IP address. Why UDP and not TCP? TCP requires an established connection, and establishing a connection requires the client to already have an IP address — the very thing it lacks. UDP’s connectionless delivery sidesteps the circular dependency.
The client now knows the router’s IP address (say 10.0.0.1) but cannot send an Ethernet frame to the router without its MAC address. Ethernet frames are addressed by MAC, not by IP. ARP (Address Resolution Protocol) bridges this gap. The client broadcasts an ARP query: “Who has 10.0.0.1? Tell me your MAC address.” The router replies with its MAC address (say 00:1A:2B:3C:4D:5E). The client caches this mapping and can now address Ethernet frames directly to the router. Without ARP, every data frame would need to be broadcast to the entire LAN — extremely wasteful on a network with dozens or hundreds of devices.
The client can reach the router. Now it needs to resolve www.google.com to an IP address. The client constructs a DNS query and encapsulates it in a UDP segment destined for port 53 on the DNS server. The query leaves the local network, traversing potentially multiple routed networks — forwarded by OSPF within an administrative domain, by BGP across domain boundaries. The DNS server resolves the name through a hierarchy of lookups (root server → .com TLD server → Google’s authoritative server) and replies with an IP address: 216.58.193.196.
The client knows the server’s IP address. Now it establishes a reliable transport connection. TCP’s three-way handshake fires: the client sends a SYN segment to 216.58.193.196 port 80, the server replies with SYN-ACK, and the client completes the handshake with ACK. The connection is established. TCP will handle reliability (retransmitting lost segments), ordering (reassembling out-of-order arrivals), and congestion control (adapting the sending rate to available capacity) for everything that follows.
The client sends an HTTP GET request through the TCP socket: GET / HTTP/1.1. The web server processes the request and replies with the HTML content of the page. The browser parses the HTML, requests additional resources (CSS, JavaScript, images) through additional HTTP requests on the same or parallel TCP connections, and renders the page. The entire sequence — DHCP, ARP, DNS, TCP, HTTP — fired in under a second.
Figure 2.2 summarizes the five-step sequence. Each protocol solves a different problem and leaves the laptop with a new piece of state that the next protocol requires.
Five protocols, five different designs. DHCP broadcasts; ARP broadcasts; DNS traverses a hierarchy; TCP handshakes between two endpoints; HTTP issues a stateless request-response. Why such diversity? Each protocol solves a different problem under different constraints. The framework’s claim: the differences are systematic. Each protocol answers the four invariants differently because of the constraints it faces. The rest of this section traces those answers explicitly.
DHCP answers each invariant in a characteristic way.
- State: Centralized — the DHCP server allocates IP addresses from a pool and tracks active leases. A single authority prevents address conflicts; two servers independently allocating from the same pool could assign the same IP to different clients.
- Time: Lease-based — the server grants each address for a fixed duration (commonly 24 hours on enterprise networks, 1–8 hours on public WiFi). The client must renew before the lease expires or lose its address. Lease duration balances address reuse (short leases reclaim addresses from departed clients quickly) against overhead (frequent renewals consume bandwidth and server processing).
- Coordination: Centralized — the server decides which address to assign, and the client accepts. The client can request a specific address, but the server has final authority.
- Interface: UDP broadcast on the LAN, because the client has no IP address with which to send a targeted request. The interface choice is forced by the bootstrap constraint: DHCP must work before the client has the very configuration DHCP provides.
DNS answers the invariants through hierarchical distribution.
- State: Hierarchically distributed — root servers know TLD servers, TLD servers know authoritative servers, authoritative servers know the actual IP addresses. Every resolver along the path caches recent results. No single server stores every name-to-address mapping; the global namespace is partitioned across thousands of authoritative servers.
- Time: TTL-based — each DNS record carries a time-to-live that tells resolvers how long to cache it. Short TTLs (30–60 seconds) enable fast updates, useful when migrating a server to a new IP address, but increase query load on authoritative servers. Long TTLs (hours to days) reduce load but make changes slow to propagate — a stale cache continues serving the old address until the TTL expires.
- Coordination: Hierarchical delegation — each level of the hierarchy is authoritative for its own zone and delegates everything below. The root zone delegates
.comto Verisign’s TLD servers; Verisign delegatesgoogle.comto Google’s authoritative servers. No single server needs to know every name. - Interface: UDP queries on port 53 (with TCP fallback for responses exceeding 512 bytes), layered on top of IP routing.
TCP is the richest case.
- State: Distributed — sender and receiver maintain independent state machines tracking sequence numbers, acknowledgment numbers, the congestion window, and RTT estimates. The congestion window deserves special attention. It is a belief about available capacity, built from a measurement signal (ACK arrivals and their timing) that partially and noisily reflects the environment (bottleneck queue occupancy, competing traffic, link error rates). This three-layer decomposition — environment state, measurement signal, internal belief — is one of the framework’s most important diagnostic tools. When TCP misjudges available capacity, the question is always: where did the belief diverge from the environment? Was the measurement signal delayed by a large buffer? Corrupted by non-congestion loss? We will return to this decomposition repeatedly throughout the book.
- Time: Inferred — TCP estimates round-trip time from ACK arrivals using Jacobson’s algorithm [Jacobson, 1988]. No shared clock exists between sender and receiver; IP provides no timing guarantees. The retransmission timeout (RTO) determines how quickly TCP detects loss — set it too low, and TCP retransmits segments that were merely delayed, wasting bandwidth; set it too high, and the connection stalls for hundreds of milliseconds after a genuine loss.
- Coordination: Distributed — each sender adjusts its rate independently via AIMD (additive increase, multiplicative decrease), with no central scheduler. Competing flows sharing a bottleneck link converge toward fairness without ever communicating with each other — each sender responds to the same congestion signal (packet loss at the bottleneck) and independently reduces its rate.
- Interface: Reliable byte stream above (the socket API that applications use), unreliable datagrams below (IP’s best-effort packet delivery).
Table 2.1 summarizes how the three protocols answer each invariant. The pattern is visible at a glance: different problems produce different structural choices.
| DHCP | DNS | TCP | |
|---|---|---|---|
| State | Centralized — server tracks address pool and leases | Hierarchically distributed — partitioned namespace with caching at every level | Distributed — endpoints maintain independent state machines (cwnd, SRTT, sequence numbers) |
| Time | Lease-based — fixed duration, client renews or loses address | TTL-based — each record carries an expiration; short TTLs enable fast updates, long TTLs reduce load | Inferred — RTT estimated from ACK arrivals via Jacobson’s algorithm; RTO governs loss detection |
| Coordination | Centralized — server decides, client accepts | Hierarchical delegation — each zone authoritative for its own names, delegates below | Distributed — each sender adjusts independently via AIMD; no central scheduler |
| Interface | UDP broadcast (forced by bootstrap: client has no IP yet) | UDP on port 53 (TCP fallback for large responses), layered on IP routing | Reliable byte stream above (socket API); unreliable datagrams below (IP) |
Why these particular answers? The constraints shaped the designs. DHCP centralizes because unique address assignment requires a single authority — without one, independent allocators risk assigning the same IP to different clients, breaking connectivity for both. TCP distributes because the Internet’s administrative decentralization means no single entity can observe all flows traversing a bottleneck link, let alone schedule them. DNS hierarchicalizes because global name resolution is too large for one server (billions of records) and too coordination-intensive for pure distribution (every resolver would need every record) — the hierarchy trades query latency (multiple resolution steps, each adding tens of milliseconds) for scalability (each server handles only its own subtree). Historical contingency, standardization politics, and deployability compromises also shaped these protocols — the framework makes the structural constraints explicit; meta-constraints (§1.7) account for the rest.
These three protocols — DHCP, DNS, and TCP — will serve as recurring examples throughout this chapter. The later chapters apply the same invariant analysis to systems you have not yet studied in this depth: wireless medium access and the hidden terminal problem (Chapter 2), cellular architecture and the CU/DU/RU disaggregation (Chapter 3), congestion control algorithms beyond AIMD — Cubic, BBR, and their design tradeoffs (Chapter 4), queue management and the bufferbloat pathology (Chapter 5), video streaming and VoIP under best-effort delivery (Chapter 7), and network measurement under information asymmetry (Chapter 8). The analytical method is the same in every case: identify the anchor, answer the four invariants, trace the dependency graph, evaluate the closed-loop dynamics.
2.3 The Four Invariants
The framework’s analytical core is a decomposition into four irreducible questions that every networked system must answer. They are structural invariants. A system that does not answer one of them has not avoided the question; it has answered it by default, usually badly. Figure 2.3 visualizes how the four invariants and their solution spaces interact.
Answering these four questions well requires understanding each invariant’s dimensions and the constraints that limit feasible answers. The sections below unfold each invariant in detail.
2.3.1 State — What Exists?
Every system maintains state — objects, resources, configuration, histories. TCP maintains a connection state machine. A DNS resolver maintains a cache of recent name-to-address mappings. A router maintains a forwarding table built from routing protocol advertisements.
Among the four invariants, State is the most diagnostically useful, because it decomposes into three layers:
Environment state is what actually exists, independent of any observer. The true bottleneck queue occupancy on a network path. The actual available bandwidth. The true congestion level at every router along a path. The environment exists whether or not anyone measures it.
Measurement signal is what an agent can observe about the environment, and at what cost. ACK arrival times in TCP. Timeout events in TCP. DNS TTL expiration as a signal that a cached record may be stale. Every measurement is partial, delayed, and potentially misleading.
Internal belief is the agent’s model of the environment, constructed from measurement signals. TCP’s congestion window is a belief about available capacity. A DNS cache entry is a belief that a domain still maps to a particular IP address. A routing table entry is a belief about the best next hop toward a destination.
The gap between environment state and internal belief is where most system failures live. Bufferbloat occurs when TCP’s belief about available capacity diverges from reality because large buffers delay the loss signal. DNS cache poisoning occurs when a resolver’s belief about a domain’s address diverges from reality because a malicious response corrupted the measurement signal. In both cases, the measurement signal fails to track the environment — and the system makes decisions based on a stale or incorrect model.
This three-layer decomposition — environment, measurement, belief — is one of the most diagnostic tools in the framework. When a system fails, ask: where did the belief diverge from the environment? Was the measurement signal delayed? Ambiguous? Missing entirely? The answer identifies not just the failure but the class of fix required.
Later chapters apply this three-layer decomposition to systems where the environment-measurement-belief gap produces different failure modes: wireless medium access (Chapter 2), queue management (Chapter 5), and adaptive video streaming (Chapter 7).
2.3.2 Time — When Do Things Happen?
Every system must handle time and ordering. Some systems prescribe time: Ethernet defines a fixed slot time for collision detection, and DHCP assigns lease durations that clients must respect. Other systems infer time: TCP estimates round-trip time from ACK arrivals, and DNS resolvers use TTL values to decide when a cached record has expired. The distinction matters.
The choice between prescribed and inferred time has architectural consequences. Prescribed time requires shared infrastructure — a common clock or a common medium. Ethernet’s collision detection requires all stations on a segment to agree on a slot time. Inferred time requires only local observation — TCP estimates RTT from its own ACK arrivals, making it deployable across any path without shared infrastructure. The tradeoff: prescribed time enables precise coordination but demands shared infrastructure; inferred time works anywhere but the inference is noisy, delayed, and sometimes wrong.
Time governs failure detection, synchronization, and adaptation speed. TCP’s retransmission timeout determines how quickly it detects loss. Ethernet’s slot time determines collision detection range. DNS’s TTL determines how quickly clients learn about address changes. Many protocol designs are fundamentally about making time tractable. Jacobson’s RTT estimator [Jacobson, 1988] transforms noisy per-packet timing observations into a smoothed estimate that TCP can act on. Lamport clocks replace physical time with logical ordering [Lamport, 1978]. In each case, the redesign changes what “time” means for the system — and the consequences cascade through the other invariants.
Walk through the three focal protocols to see how different timing strategies arise from different constraints.
DHCP: prescribed time through lease duration. DHCP’s timing is entirely prescribed by the server. When the server grants an IP address, it attaches a lease duration — say 86,400 seconds (24 hours) on a campus network, or 3,600 seconds (1 hour) on a coffee-shop hotspot. The client treats this duration as a contract: the address is valid until the lease expires. The protocol prescribes two renewal thresholds. At T1 (default: 50% of the lease duration), the client unicasts a DHCPREQUEST to the original server, attempting a quiet renewal. If the server is reachable and willing, it responds with a DHCPACK carrying a fresh lease, and the client resets its timers. If the server does not respond by T2 (default: 87.5% of the lease), the client escalates: it broadcasts a DHCPREQUEST to any DHCP server on the LAN, because the original server may have failed. If no server responds by lease expiration, the client must relinquish the address — it loses its IP configuration and must restart the DHCP discovery process from scratch. The failure mode is stark: a client whose lease expires mid-session loses all active connections. The timing parameters encode a tradeoff. Short leases reclaim addresses quickly from departed clients (important on networks with high churn, like a conference venue), but generate more renewal traffic and more opportunities for the client to lose its address if the server is briefly unreachable. Long leases reduce overhead but leave addresses tied to clients that may have left hours ago.
DNS: prescribed time, inferred expiration. DNS timing operates through TTL (time-to-live) values, but the perspective matters. From the authoritative server’s viewpoint, the TTL is prescribed — the zone administrator sets it explicitly, encoding a judgment about how stable the record is. Google might set www.google.com to a TTL of 300 seconds, reflecting frequent infrastructure changes; a university might set its mail server’s record to 86,400 seconds, reflecting a stable mapping. From the resolver’s viewpoint, however, the TTL is an inferred countdown — the resolver did not choose the duration; it received it as part of a response and must trust the authoritative server’s judgment. The tension between freshness and load is fundamental. A CDN operator migrating traffic to a new data center wants resolvers to learn the new address within seconds — a short TTL. But a short TTL means every resolver worldwide re-queries the authoritative server every few seconds, multiplying load by the number of active resolvers. The authoritative server sets the TTL; it bears the query load; and the resolver has no way to negotiate. When a TTL is too long, resolvers serve stale data — users reach a decommissioned server. When a TTL is too short, authoritative servers face query storms after popular records expire simultaneously. Unlike DHCP, where the client and server have a direct relationship, DNS timing is set unilaterally by the authority and enforced passively by the cache.
TCP: inferred time from endpoint observations. TCP has no prescribed time whatsoever. No server tells the sender how long to wait; no authority sets a timeout duration. The sender must infer timing from the only signal available: when ACKs arrive. Jacobson’s RTT estimator [Jacobson, 1988] maintains two variables — a smoothed RTT estimate (SRTT) and a smoothed mean deviation (RTTVAR) — updated with each ACK: SRTT = (1-alpha) * SRTT + alpha * SampleRTT, and RTTVAR = (1-beta) * RTTVAR + beta * |SampleRTT - SRTT|, where alpha = 1/8 and beta = 1/4. The retransmission timeout is then RTO = SRTT + 4 * RTTVAR. This formula encodes a bet: the next RTT will fall within four standard deviations of the mean. The failure modes are symmetric and damaging. If the RTO is too low — the estimate underestimates actual delay — TCP fires spurious retransmissions. It retransmits segments that are still in flight, wasting bandwidth and potentially triggering unnecessary congestion responses (the sender halves its window for a “loss” that was not a loss). If the RTO is too high — the estimate overestimates delay — TCP stalls after genuine loss, waiting hundreds of milliseconds before retransmitting while the application sees a freeze. The difficulty is that RTT is not a fixed quantity: it varies with queue occupancy, route changes, and competing traffic. A path that had a 20ms RTT can spike to 200ms when a buffer fills. Jacobson’s estimator tracks these changes, but it tracks them with a delay — it is a low-pass filter on a noisy signal, and sudden shifts in the environment take multiple RTTs to propagate into the belief.
| Protocol | Prescribed vs. Inferred | What Governs Timing | Failure Mode When Timing Goes Wrong |
|---|---|---|---|
| DHCP | Prescribed — server sets lease duration, T1/T2 renewal thresholds | Lease duration (hours); T1 at 50%, T2 at 87.5% | Lease expires → client loses IP, all connections drop; too-short leases → excessive renewal traffic |
| DNS | Prescribed by authority, inferred by resolver — TTL set at authoritative server, passively obeyed by caches | TTL value per record (seconds to days) | TTL too long → stale records served, users reach wrong server; TTL too short → query storms at authoritative servers |
| TCP | Inferred — RTT estimated from ACK arrivals via Jacobson’s algorithm | RTO = SRTT + 4 * RTTVAR, updated per ACK | RTO too low → spurious retransmissions waste bandwidth; RTO too high → connection stalls after genuine loss |
The Time invariant generalizes well beyond these three protocols. OSPF’s hello interval (default 10 seconds) is prescribed time — routers declare a neighbor dead after missing a fixed number of hellos, and the interval determines how quickly the network detects a link failure. BGP’s hold timer (default 90 seconds) is similarly prescribed but far more conservative, reflecting the protocol’s preference for stability over rapid convergence. ARP cache entries expire after a timeout (typically 20 minutes on Linux), a prescribed duration that balances stale-mapping risk against broadcast overhead. Ethernet’s slot time (51.2 microseconds on 10 Mbps Ethernet) prescribes the collision detection window and thereby constrains maximum segment length. In every case, the same question applies: is the timing prescribed or inferred, and what fails when the timing is wrong?
Later chapters examine time in systems with tighter constraints: microsecond-scale slot times in wireless (Chapter 2), millisecond-scale scheduling in cellular networks (Chapter 3), and sojourn-time measurement in queue management (Chapter 5).
2.3.3 Coordination — Who Decides?
Whenever multiple actors share a resource, the system must define coordination rules. Who has authority? How do components agree? How do conflicts resolve?
TCP congestion control is distributed — each sender adjusts independently, with coordination emerging from shared bottleneck effects. DNS uses hierarchical coordination — root servers delegate to TLD servers, which delegate to authoritative servers, with caching at every level reducing the load on higher tiers. OSPF routing uses distributed coordination — each router computes shortest paths independently from link-state advertisements that all routers share. These are different points on the coordination spectrum, shaped by different constraints.
Neither distributed nor centralized coordination dominates. Centralized coordination enables global optimization: a DHCP server sees the entire address pool and allocates without conflict — but it is a single point of failure, and all clients must be able to reach it. Distributed coordination enables scale and fault tolerance: TCP works across arbitrary paths without prior arrangement — but convergence is slow and fairness is approximate. The right answer depends primarily on the anchoring constraints. The Internet’s administrative decentralization pushes toward distributed endpoints.
Trace the three focal protocols to see how different coordination structures arise from different constraints — and where each one breaks.
DHCP: centralized authority with relay extensions. DHCP is the clearest case of centralized coordination in the protocol suite. A single DHCP server holds the address pool and has sole authority to allocate addresses. The client requests; the server decides. The client may include a “requested address” option suggesting a preferred IP (typically one it held previously), but the server can ignore it. This centralization is not a stylistic choice — it is forced by the constraint that IP addresses on a subnet must be unique. Two independent allocators drawing from the same pool could assign the same address to different clients, breaking connectivity for both. A single authority eliminates the coordination problem entirely. But centralization introduces two engineering challenges. First, reach: DHCP Discover messages are LAN broadcasts, confined to the client’s local subnet. A campus with 200 subnets would need 200 DHCP servers — one per subnet — without a relay mechanism. DHCP relay agents solve this: a router on each subnet intercepts broadcast Discovers, rewrites them as unicast messages to a central DHCP server (adding a “gateway address” field so the server knows which subnet’s pool to use), and relays the response back. The relay extends centralized authority across subnet boundaries without distributing the decision. Second, availability: if the single DHCP server crashes, no new clients can obtain addresses, and existing clients cannot renew leases. DHCP failover protocols exist but are complex — two servers must synchronize their lease databases to avoid conflicting allocations, which reintroduces the very coordination problem that centralization was meant to avoid. In practice, many networks accept the single-point-of-failure risk rather than deploy failover, relying on long lease durations to mask brief server outages.
DNS: hierarchical delegation with caching as implicit coordination. DNS coordinates through hierarchical delegation — a structure forced by the Internet’s administrative decentralization. No single entity controls all domain names, so no single server can be authoritative for the entire namespace. Instead, authority is partitioned: the root zone delegates .com to Verisign’s TLD servers; Verisign delegates google.com to Google’s authoritative servers; Google delegates cloud.google.com to its cloud infrastructure’s name servers. Each level is authoritative for its own zone and explicitly delegates everything below. The delegation chain is the coordination mechanism — it defines who decides what, without requiring any two authorities to agree on anything outside their own zone. Caching adds a second, implicit layer of coordination. When a recursive resolver queries the full delegation chain and obtains an answer, it caches both the final record and the intermediate delegation records (NS records for .com, for google.com, etc.). Subsequent queries for any name under google.com skip the root and TLD servers entirely, going directly to Google’s authoritative servers. This caching reduces load on higher tiers by orders of magnitude — the root servers handle roughly 10,000 queries per second rather than the billions they would face without caching. But caching is implicit coordination: no entity decided that the resolver should absorb this load; it is an emergent consequence of TTL-governed caching interacting with query patterns. The root zone occupies a special position as the trust anchor of the entire system. The 13 root server identities (A through M) are hardcoded into every resolver’s configuration. Changing a root server — or compromising one — has global consequences. DNSSEC adds cryptographic validation to the delegation chain, turning the hierarchy into a chain of trust where each level signs the delegation to the level below. Coordination failure in DNS manifests as inconsistency: a resolver with a stale cached record directs users to a decommissioned server, while a resolver with a fresh record directs them correctly. The two resolvers are uncoordinated — each acts on its own cache — and the inconsistency persists until the stale TTL expires.
TCP: distributed coordination through emergent fairness. TCP is the polar opposite of DHCP. There is no central authority, no delegation hierarchy, no explicit communication between competing flows. Each TCP sender adjusts its sending rate based solely on its own observations — ACK arrivals, timeouts, ECN marks — with no knowledge of how many other flows share the bottleneck, what their rates are, or what the bottleneck’s capacity is. Coordination is entirely emergent. AIMD (additive increase, multiplicative decrease) is the mechanism: each sender increases its congestion window by roughly 1 MSS per RTT (additive increase) and halves it upon detecting loss (multiplicative decrease). Chiu and Jain [1989] proved that AIMD converges to fairness — if two flows sharing a bottleneck start at arbitrary rates, repeated rounds of additive increase followed by multiplicative decrease drive them toward equal shares. The proof relies on a geometric argument: additive increase moves the system along a 45-degree line (both flows grow equally), while multiplicative decrease moves it toward the origin along a line through the current operating point. The intersection of these trajectories converges to the fairness line. But convergence is slow and approximate. Flows with different RTTs converge to different shares — a flow with a 10ms RTT increases its window 10 times faster than a flow with a 100ms RTT, because it completes 10 additive-increase rounds in the same wall-clock time. This RTT bias is a fundamental consequence of distributed coordination: without a central scheduler that knows both flows’ RTTs, there is no mechanism to compensate. The lack of explicit coordination also means TCP cannot distinguish between different causes of packet loss. A packet dropped due to congestion at a bottleneck router and a packet dropped due to a bit error on a wireless link look identical to the sender — both manifest as a missing ACK. The sender responds to both with multiplicative decrease, even though only congestion loss warrants a rate reduction. This ambiguity is a direct cost of distributed coordination: a centralized scheduler at the bottleneck could distinguish congestion drops from other drops, but TCP’s endpoint-only design has no such visibility.
| Protocol | Coordination Style | Who Decides | What Happens When Coordination Fails |
|---|---|---|---|
| DHCP | Centralized — single server holds address pool | DHCP server; client requests, server grants or denies | Server crash → no new leases, renewals fail; dual servers without failover → address conflicts |
| DNS | Hierarchical delegation — each zone authoritative for its names | Each authoritative server for its zone; caching resolvers obey TTLs | Stale caches → inconsistent answers across resolvers; root compromise → global trust failure |
| TCP | Distributed — no central authority, no inter-flow communication | Each sender independently, based on local ACK observations | RTT-biased fairness; congestion/non-congestion loss conflation; slow convergence with many flows |
The coordination spectrum extends across every protocol students encounter in an introductory course. OSPF uses distributed coordination — every router floods link-state advertisements and independently computes shortest paths, converging to consistent forwarding tables without any central authority. BGP adds policy-driven coordination — each autonomous system makes routing decisions based on local preferences, coordinating only through route announcements, which means global convergence is not guaranteed and routing anomalies can persist indefinitely. ARP uses no coordination at all — any host can claim any IP-to-MAC mapping, which is why ARP spoofing is trivially easy. Ethernet switching uses a hybrid: the spanning tree protocol elects a root bridge (centralized decision) but each switch independently computes its forwarding state from the elected topology (distributed execution). In every case, the coordination structure reflects the constraints: who has authority, what information is available to each decision-maker, and what is the cost of disagreement.
Later chapters show how shared-medium physics pushes wireless systems toward centralization at scale (Chapters 2-3).
2.3.4 Interface — How Do Components Interact?
Every system defines interfaces between layers or modules. TCP presents a reliable byte stream to applications and uses unreliable datagrams from IP. The socket API is one of the most successful interfaces in computing — applications operate without awareness of packets, loss, or reordering.
Interfaces are the most path-dependent design choice. The IP packet interface was inherited from the Internet’s original architecture [Clark, 1988]. It constrains everything built above it — TCP must infer congestion from endpoint observations precisely because IP routers expose only drop/forward behavior. Changing this interface is expensive: ECN took decades to deploy [Ramakrishnan et al., 2001]. But the interface evolves at the margins — QUIC moved transport over UDP to avoid middlebox ossification [Langley et al., 2017].
The thin waist — a narrow interface decoupling diverse things above from diverse things below — is among the most consequential architectural patterns in networking [Deering, 1998]. IP’s datagram interface allows any link technology below and any application above to evolve independently. But the same stability that makes an interface valuable makes it resistant to change. Middleboxes that inspect TCP headers ossify the transport layer; adding a new TCP option becomes a multi-year deployment battle. QUIC’s response — encrypt the transport header and run over UDP — is an interface renegotiation forced by the ossification of the previous one.
Walk through the three focal protocols to see how interface choices are shaped by constraints — and how interfaces, once deployed, resist change.
DHCP: UDP broadcast forced by the bootstrap constraint. DHCP’s interface is the most constrained of the three, because DHCP must operate before the client has the configuration that every other protocol assumes. The client has no IP address — so it cannot send a unicast IP packet. It has no knowledge of the DHCP server’s address — so it cannot target a specific destination. The only interface available is the Ethernet broadcast: destination MAC FF:FF:FF:FF:FF:FF, source IP 0.0.0.0, destination IP 255.255.255.255. This is the chicken-and-egg problem of network bootstrapping: the client needs IP to communicate, but needs DHCP to get IP. The protocol resolves it by dropping below the IP layer’s normal assumptions — using broadcast at Layer 2 to reach a server whose identity is unknown. The encapsulation stack is revealing: DHCP messages ride inside UDP (port 67 server, port 68 client), inside IP, inside Ethernet. UDP is chosen over TCP because TCP requires a connection, and connections require IP addresses on both sides. The interface above DHCP is implicit — the operating system’s network configuration subsystem consumes DHCP’s output (IP address, subnet mask, default gateway, DNS server) and configures the IP stack. There is no formal API; DHCP is a configuration protocol, not a data transport protocol. The interface below is raw Ethernet broadcast — the most primitive communication mechanism available on a LAN. This interface choice has consequences: DHCP Discover messages are confined to the local broadcast domain, which is why DHCP relay agents are needed for multi-subnet deployments. The relay agent converts a broadcast interface into a unicast one, extending the server’s reach without changing the client’s behavior.
DNS: UDP with TCP fallback and the pressure to renegotiate. DNS’s interface was designed for a specific constraint: name resolution queries and responses are typically small (a few hundred bytes), latency-sensitive (users wait for resolution before any application data flows), and stateless (each query is independent). UDP on port 53 fits perfectly — no connection setup overhead, one query packet out, one response packet back. But the original specification imposed a hard constraint: UDP DNS responses must fit in 512 bytes. This limit reflected the minimum reassembly buffer size guaranteed by IP (576 bytes minus IP and UDP headers). Responses exceeding 512 bytes — common for zones with many records, DNSSEC signatures, or large TXT records — trigger a fallback to TCP. The client re-issues the query over a TCP connection to port 53, incurring the cost of a three-way handshake for a single response. EDNS0 (Extension Mechanisms for DNS, RFC 6891) relaxed the 512-byte limit by allowing the client to advertise a larger buffer size (typically 4096 bytes) in the query, enabling larger UDP responses without TCP fallback. But EDNS0 is a negotiation — middleboxes and firewalls that do not understand it may drop oversized UDP packets, creating silent failures. The most dramatic interface renegotiation in DNS is DNS-over-HTTPS (DoH, RFC 8484) and DNS-over-TLS (DoT, RFC 7858). Traditional DNS queries are sent in plaintext over UDP — visible to every network device on the path. ISPs can inspect, log, and even modify DNS responses (a practice used for censorship and ad injection). DoH tunnels DNS queries inside HTTPS on port 443, making them indistinguishable from normal web traffic. This is an interface renegotiation driven by privacy constraints: the technical content of the query is unchanged, but the interface between the resolver and the recursive server shifts from plaintext UDP to encrypted HTTPS. The consequences cascade — network operators lose visibility into DNS traffic, enterprise firewalls cannot enforce DNS-based security policies, and the trust model shifts from the network path to the recursive resolver operator (typically a large provider like Cloudflare or Google).
TCP: the abstraction gap between byte streams and datagrams. TCP’s interface is a two-sided abstraction. Above, it presents the socket API — connect(), send(), recv(), close() — exposing a reliable, ordered byte stream. An application writes bytes into the socket and reads bytes out; it never sees packets, never handles retransmissions, never deals with reordering. Below, TCP uses IP’s datagram interface — unreliable, unordered packets with a maximum size (MTU, typically 1500 bytes on Ethernet minus headers). The abstraction gap between these two interfaces is enormous. The application sees a continuous stream; the network delivers discrete, lossy, reorderable packets. TCP bridges this gap through segmentation (breaking the byte stream into MSS-sized segments), sequence numbering (tracking the byte offset of each segment), acknowledgment (confirming receipt), retransmission (resending unacknowledged segments), and reordering (buffering out-of-order segments until gaps are filled). This abstraction gap is the source of TCP’s complexity — and its value. Applications can be written as if the network were a perfect pipe, because TCP absorbs all the imperfections. But the abstraction also hides information that some applications need. A video streaming client might prefer to skip a lost frame rather than wait for a retransmission that arrives too late to display — but the socket API provides no mechanism to express this preference. The byte stream abstraction forces reliability on all data equally. This is why UDP exists as an alternative: it exposes the datagram interface directly to the application, letting the application make its own reliability/latency tradeoffs. The socket API is also the site of the Internet’s most consequential ossification. Because the API is implemented in the OS kernel, and because every application on the system depends on it, changing TCP’s behavior requires kernel updates — a slow, coordinated process across millions of devices. QUIC’s decision to implement transport in user space, over UDP, is a direct response to this ossification: by moving above the kernel’s socket layer, QUIC can iterate on transport mechanisms without waiting for OS vendors to ship kernel patches.
| Protocol | Abstraction Above | Abstraction Below | Ossification Risk |
|---|---|---|---|
| DHCP | OS network configuration (implicit — no formal API) | Ethernet broadcast + UDP (forced by bootstrap: no IP address yet) | Low — DHCP operates only at boot/renewal; few middleboxes inspect it |
| DNS | Application name resolution (getaddrinfo(), stub resolver API) |
UDP port 53 (TCP fallback for large responses); evolving to DoH/DoT | Moderate — plaintext DNS is inspected/modified by ISPs and firewalls; DoH renegotiates the interface |
| TCP | Socket API: reliable ordered byte stream (connect, send, recv) |
IP datagrams: unreliable, unordered, size-limited packets | High — kernel implementation + middlebox header inspection ossify both the API and the wire format |
The Interface invariant is visible in every protocol layering decision students have encountered. ARP defines an interface between IP addresses and MAC addresses — a simple query/response protocol that bridges the network and link layers. Ethernet defines a frame format interface: source MAC, destination MAC, EtherType, payload, FCS — a contract that every NIC and switch on the LAN must honor, and one that has remained stable since 1980 even as speeds increased from 10 Mbps to 400 Gbps. OSPF and BGP define interfaces between routing domains: OSPF’s link-state advertisements are the interface through which routers share topology information within an AS, while BGP’s route announcements are the interface through which autonomous systems share reachability information across administrative boundaries. In each case, the interface enables independent evolution on both sides — but the interface itself resists change precisely because both sides depend on it.
2.4 Design Principles: Strategies for Answering Well
Invariants define what must be answered. Three recurring principles describe how good answers are constructed. They are recurring solution patterns — strategies that successful systems employ under constraints. Invariants are structural: every system answers them. Principles are strategic: they capture what works. The distinction matters: invariants define the design space; principles guide navigation within it.
2.4.1 Disaggregation — Separation of Concerns Under Constraints
Separating coupled concerns into independently controllable dimensions. The Internet’s protocol stack is the canonical example: it separates naming (DNS) from addressing (IP) from routing (OSPF/BGP) from reliable delivery (TCP) from application logic (HTTP). Each layer can evolve independently — HTTP moved from 1.0 to 1.1 to 2.0 to 3.0 without changing TCP or IP. IPv4 is being replaced by IPv6 without rewriting applications. This independence would be impossible if the entire stack were a monolithic design.
The cost is interface overhead. Every separation boundary introduces a coordination signal, and that signal can degrade. The layered stack forces TCP to infer congestion from endpoint observations because IP’s interface hides router queue state. A richer interface — routers explicitly signaling congestion via ECN — took decades to deploy [Ramakrishnan et al., 2001], precisely because changing an interface between independently evolving layers is expensive. Later chapters show disaggregation at other scales: 5G base station decomposition (Chapter 3) and control/data plane separation via SDN (Chapter 5).
The three focal protocols demonstrate disaggregation concretely — and reveal what would break if the separations did not exist.
DNS separates naming from addressing from routing. A domain name (www.google.com) is not an IP address (216.58.193.196), and an IP address is not a route. These three concepts — name, address, route — could in principle be fused into a single identifier. But disaggregation makes each independently changeable. Google can change the IP address behind www.google.com by updating a single DNS record; every client that re-resolves the name reaches the new address without any routing change. Google can move traffic between data centers by changing DNS responses — no BGP announcement needed, no IP renumbering required. What would break if naming and addressing were coupled? Changing an IP address would require simultaneously updating every name record that references it, and changing a name would require updating IP routing tables. In practice, this coupling would make CDN load balancing, failover, and migration — operations that happen thousands of times per second across the Internet — impossible at scale. The disaggregation of naming from addressing is what makes the modern web’s infrastructure agility possible.
TCP separates reliable delivery from routing. TCP guarantees ordered, reliable delivery to the application. IP routes packets toward a destination. These concerns are separated: TCP does not know or care which path its segments traverse, and IP does not know or care whether a packet belongs to a reliable stream or a best-effort datagram. What would break if TCP had to know the path? Every route change — caused by a link failure, a BGP policy update, or a load-balancing decision — would break active TCP connections, because the transport layer’s state would be coupled to a specific path. The separation allows IP to reroute packets mid-connection (as happens routinely in the Internet) without disrupting TCP’s reliability guarantees. The cost of this separation is that TCP cannot optimize for the path — it cannot distinguish a satellite link (high latency, low loss) from a fiber link (low latency, low loss) without inferring path characteristics from endpoint observations.
DHCP separates address allocation from address use. The DHCP server manages the address pool — which addresses are available, which are leased, when leases expire. The client uses the assigned address for all subsequent communication but has no knowledge of or responsibility for pool management. The client does not know how many addresses the pool contains, which other addresses are in use, or what the allocation policy is. What would break if allocation and use were coupled? If clients had to participate in address management — tracking which addresses are available, resolving conflicts, coordinating with other clients — the bootstrap problem would become intractable. A new client with no network configuration would need to coordinate with existing clients to find an available address, but coordination requires the network configuration it does not yet have. DHCP’s disaggregation resolves this: the server handles allocation (a problem requiring global state), and the client handles use (a problem requiring only local state).
2.4.2 Closed-Loop Reasoning — How Decisions Adapt Over Time
The dynamical discipline that evaluates whether the combined invariant answers produce stable, convergent behavior. Closed-loop reasoning is not an afterthought applied at evaluation time — it is an analytical lens present throughout design.
Every adaptive protocol you studied in an introductory course is a feedback loop. TCP’s AIMD is the canonical example: observe ACK stream → infer congestion → adjust sending rate → observe again. The loop period is one RTT; the gain is 1 MSS per RTT on increase, 50% on decrease. DNS caching is a simpler feedback loop: resolve a name → cache the result → serve from cache until TTL expires → re-resolve. If the TTL is too long, the cache serves stale data — the belief diverges from the environment.
The question closed-loop reasoning asks is always the same: given these invariant answers, will the system converge? What is the loop’s bandwidth — how fast can it track environmental changes? Where are the oscillation modes? TCP’s congestion control can oscillate: if many flows sharing a bottleneck all detect loss simultaneously, they all halve their windows together, then ramp up together — a phenomenon called global synchronization that wastes link capacity. Later chapters examine feedback loops in systems with different dynamics: exponential backoff in wireless medium access (Chapter 2), queue management algorithms that reshape the congestion signal (Chapter 5), and model-based congestion control that restructures the loop entirely (Chapter 4).
Trace the feedback loop concretely for each focal protocol to see how loop structure determines system behavior.
TCP AIMD: fast loop, noisy signal, oscillation risk. The loop operates as follows: the sender transmits segments and observes ACK arrivals. From ACK timing, it estimates available capacity (the belief). It adjusts the congestion window — increasing by 1 MSS per RTT when ACKs arrive normally (additive increase), halving when a loss is detected (multiplicative decrease). Then it observes again. The loop period is one RTT — typically 10–200ms on wide-area paths. The gain is asymmetric: slow increase (linear), fast decrease (multiplicative). This asymmetry is deliberate — it makes the system probe cautiously but retreat aggressively from congestion, which stabilizes the bottleneck queue. What causes oscillation? When many flows share a bottleneck, they all probe simultaneously, all trigger loss at roughly the same time, and all retreat together. The bottleneck alternates between overload (all flows probing) and underutilization (all flows retreating) — global synchronization. Random Early Detection (RED) and its successors address this by dropping packets probabilistically before the queue is full, desynchronizing the loss signals across flows. The deeper point: TCP’s feedback loop is well-designed for a single flow but exhibits emergent oscillation when multiple loops interact through a shared bottleneck. The interaction between loops is a recurring theme in Chapters 4 and 5.
DNS TTL: slow loop, unilateral control, staleness risk. The DNS feedback loop is structurally simpler: a resolver queries an authoritative server, receives a record with a TTL, caches the record, serves it from cache until the TTL expires, then re-queries. The loop period equals the TTL — which can range from 30 seconds to 86,400 seconds (24 hours), set unilaterally by the zone administrator. The resolver has no control over the loop period; it can only obey the TTL or violate the protocol. What happens when the loop is too slow? If a server migrates to a new IP address and the TTL is 24 hours, resolvers worldwide will continue directing traffic to the old address for up to a day. Users see failures; operators have no mechanism to accelerate the flush. This is why operators planning a migration first lower the TTL (say to 60 seconds) well in advance, wait for the old TTL to expire across all caches, perform the migration, then raise the TTL again. What happens when the loop is too fast? If the TTL is 30 seconds and the domain is popular (queried by millions of resolvers), the authoritative server faces a query storm every 30 seconds as caches expire nearly simultaneously. The load scales with popularity divided by TTL — a tradeoff with no free parameter. The DNS loop has no adaptive gain: the system cannot speed up when freshness matters or slow down when load is high. The TTL is a static parameter, not a dynamic control. This rigidity is a direct consequence of DNS’s coordination structure — the authority sets the TTL, and the resolver obeys.
DHCP lease renewal: escalating loop with catastrophic failure. DHCP’s feedback loop has three stages with escalating urgency. Stage 1: the client obtains a lease and uses the address. At T1 (50% of lease duration), the client unicasts a renewal request to the original server. If the server responds with a fresh lease, the loop resets — the client has confirmed that its address is still valid and the server is still reachable. Stage 2: if the server does not respond by T2 (87.5% of lease duration), the client broadcasts a renewal to any DHCP server on the LAN. This escalation from unicast to broadcast reflects increasing urgency — the original server may have failed, and the client needs any authority to confirm its address. Stage 3: if no server responds by lease expiration, the client must relinquish the address and restart discovery from scratch. The failure mode is catastrophic: all active TCP connections drop, all application sessions terminate, and the client is temporarily unreachable. The loop period is the lease duration — hours on most networks. What happens when the server is unreachable at renewal? If the outage is brief (resolved before T2), the client renews normally at the next attempt. If the outage spans T1 to T2, the client escalates to broadcast. If the outage spans the entire lease, the client loses its address. The lease duration is thus a bet on server availability: short leases detect server failure faster but increase the risk of address loss during brief outages; long leases tolerate outages better but delay address reclamation from departed clients.
2.4.4 A Preview: The Fundamental Limitation of Carrier Sense
The three design principles – disaggregation, closed-loop reasoning, and decision placement – provide a vocabulary for analyzing how systems navigate constraints. Before moving to the anchored dependency graph, one concrete example illustrates how these principles interact. Carrier sense in wireless networks is a coordination mechanism that relies on local measurement to infer remote conditions. When the measurement medium is spatially non-uniform (as electromagnetic propagation always is), the local observation at the sender diverges from the reality at the receiver. This single limitation produces two distinct failure modes that later chapters examine in detail (Chapter 2).
This pattern – local measurement failing to capture remote reality – recurs throughout networked systems. TCP infers congestion from endpoint observations that may not reflect the actual queue state at a bottleneck router. DNS caching serves records that may have changed at the authoritative server. In each case, the design question is the same: how much can local observation tell you about conditions elsewhere, and what mechanisms compensate when it falls short?
2.5 The Anchored Dependency Graph
Invariants do not live in isolation. In any given system, one or two anchoring choices constrain the feasible answers to the remaining invariants. The ordering varies by system. Anchors come from several sources: physics (shared-medium propagation in WiFi), legacy interfaces (IP’s best-effort datagram service), institutional boundaries (administrative decentralization of the Internet), or deployment realities (commodity hardware constraints). What makes something an anchor is that it is harder to change than the invariant answers it constrains.
Anchor choice(s)
(physics, architecture, legacy, deployability)
|
v
Constrain feasible answers to other invariants
|
v
Design principles guide choices within constraints
|
v
Closed-loop dynamics shape system behavior
|
v
Emergent properties (fairness, scaling, failure modes)
This cascade of constraints — anchor constraining invariants, invariants shaping principles, principles guiding implementation — is visualized in Figure 2.6. The anchored dependency graph is the method: given any system, identify the anchor, trace the constraints, understand how principles guide the choices within those constraints, and evaluate the resulting closed-loop dynamics.
Three worked examples demonstrate the method by tracing, step by step, how the anchor forces each subsequent invariant answer.
TCP worked example. Start from the anchor: IP provides unreliable datagrams across administratively decentralized paths. No single entity controls the paths that TCP segments traverse; no router is obligated to report its queue state; no central scheduler allocates bandwidth. From this anchor, trace the cascade:
- Coordination (forced by the anchor): Administrative decentralization means no central scheduler can observe all flows at a bottleneck. TCP must coordinate in a distributed fashion — each sender adjusts independently, with no knowledge of competing flows.
- State (forced by distributed coordination): Because no central entity aggregates information, each endpoint must build its own model of the path. The congestion window (cwnd) is the sender’s belief about available capacity, constructed from a measurement signal (ACK arrivals and their timing) that partially and noisily reflects the environment (bottleneck queue occupancy, competing traffic). This is endpoint-local state — the only kind available when coordination is distributed.
- Time (forced by endpoint-local state): Because the sender has no external clock or timing authority, it must infer time from its own observations. Jacobson’s RTT estimator computes SRTT and RTTVAR from ACK arrivals. The retransmission timeout (RTO = SRTT + 4*RTTVAR) is the sender’s best guess at when to declare a segment lost. No server prescribes this timeout; it emerges from measurement.
- Interface (forced by the anchor and all prior choices): Above, TCP presents a reliable byte stream — the socket API that hides packets, loss, and reordering from applications. Below, it uses IP’s datagram service. The interface above is the reward of all the complexity below; the interface below is the constraint that made all the complexity necessary.
Each step is forced by the previous one. A different anchor — say, a centralized network with a single administrator — would produce a completely different cascade (centralized coordination, global state, prescribed time), which is exactly what we see in datacenter TCP variants like DCTCP.
DNS worked example. Start from the anchor: the global namespace is too large for one server (billions of records) and administratively decentralized (millions of independent organizations control their own names). From this anchor:
- Coordination (forced by the anchor): No single entity has the authority or capacity to resolve all names. Coordination must be hierarchical — the root zone delegates to TLD servers, TLD servers delegate to authoritative servers. Each level is authoritative for its own zone and delegates everything below. This hierarchy partitions the namespace to match administrative reality.
- State (forced by hierarchical coordination): Because resolution traverses multiple levels, and because queries are frequent while namespace changes are rare, caching at every level is essential. Each resolver maintains a cache of recent query results. Each cached record is a belief — a snapshot of a name-to-address mapping that was true at query time but may have changed since. State is distributed across thousands of caching resolvers worldwide, with no central cache.
- Time (forced by distributed cached state): Cached records must expire; otherwise stale beliefs persist indefinitely. The TTL mechanism provides prescribed expiration — the authoritative server sets the TTL, and every cache obeys it. The authority controls the tradeoff between freshness (short TTL) and load (long TTL). Resolvers have no ability to negotiate or override.
- Interface (forced by all prior choices): UDP queries on port 53 for speed and simplicity, with TCP fallback for large responses. The hierarchical resolution process — iterative queries from resolver to root to TLD to authoritative — is the interface between the resolver and the distributed namespace. DNS-over-HTTPS and DNS-over-TLS represent interface renegotiations driven by privacy constraints that did not exist when the original interface was designed.
DHCP worked example. Start from the anchor: IP addresses must be unique on a subnet, and the client has no IP address at bootstrap. From this anchor:
- Coordination (forced by the uniqueness constraint): Two independent allocators could assign the same address to different clients, breaking both. A single allocator — the DHCP server — eliminates the coordination problem entirely. Coordination is centralized.
- State (forced by centralized coordination): The server maintains the address pool and a lease database — which addresses are available, which are assigned, to whom, and until when. This is centralized state, co-located with the centralized decision-maker.
- Time (forced by centralized state and the need for address reclamation): Addresses must be reclaimed when clients depart. But the server cannot know when a client leaves (the client may crash without notifying the server). Lease duration solves this: the server prescribes a time limit, and the client must renew or lose the address. The timing is fully prescribed by the authority.
- Interface (forced by the bootstrap constraint): The client has no IP address, so it cannot send unicast. The only available interface is UDP broadcast on the LAN — the most primitive communication mechanism available, but the only one accessible to an unconfigured client.
Notice the structural difference: TCP’s cascade starts from a distributed anchor and produces distributed answers throughout. DHCP’s cascade starts from a centralization constraint and produces centralized answers throughout. DNS’s cascade starts from a scaling-plus-decentralization constraint and produces hierarchical answers throughout. The anchor determines the character of the entire system.
SDN starts from a coordination goal — centralized visibility. BBR starts from a state redesign — explicit path model. QUIC starts from an interface renegotiation — transport over UDP. The method is the same: identify the anchor, trace the cascade, evaluate the dynamics. Later chapters trace the dependency graph for systems with different anchors: shared-medium physics in wireless (Chapter 2), finite buffers in queue management (Chapter 5), and human perceptual requirements in multimedia applications (Chapter 7).
2.6 Decomposition: Why Systems Have Parts
The Internet is one system with a single global purpose: deliver data between endpoints in a way that satisfies user needs. But no one designs the Internet as a single system. It is decomposed — broken into subsystems that each handle a piece of the problem, coordinating through well-defined interfaces.
This decomposition is not accidental. Chiang, Low, Calderbank, and Doyle (Chiang et al. 2007) showed that the Internet can be modeled as a global optimization problem called Network Utility Maximization (NUM): maximize aggregate user satisfaction subject to physical constraints (link capacities, spectrum, propagation). The key insight is that decomposing this optimization problem into subproblems is equivalent to choosing a network architecture. Each subproblem becomes a protocol or functional module. The coordination signals between subproblems — the prices, penalties, or feedback that one module sends to another — become the interfaces between modules. Different decompositions of the same global problem yield different architectures, each with distinct tradeoffs in efficiency, robustness, and complexity.
This perspective makes the protocol stack legible. TCP’s congestion control is one subproblem (adjust source rates to match available capacity). Queue management is another (manage buffer occupancy at links). The loss or ECN signal flowing from router to sender is the coordination variable coupling the two. When that signal is delayed — a 10 GB buffer absorbing 8,000 RTTs of packets before dropping — the coupling degrades. The subproblems diverge, and we get bufferbloat. The decomposition did not fail; the coordination signal did.
2.6.1 Six Axes of Decomposition
The Internet decomposes along multiple axes simultaneously. Each axis produces a different cut of the system into subsystems — and each cut determines what coordination signals flow between the pieces.
Administrative decomposition. The Internet divides into Autonomous Systems (ASes), each under independent administrative control. BGP routes traffic between ASes through a process where each AS maximizes its own utility subject to policy constraints. This decomposition is not a design choice — it reflects institutional reality. No single entity controls the global Internet, so the coordination signal between ASes (BGP route announcements) is deliberately coarse: reachability and policy, not capacity or congestion.
Functional decomposition. Within a network, the system splits into a control plane (which computes routing tables, access policies, resource allocations) and a data plane (which forwards packets according to those decisions). SDN made this decomposition explicit by physically separating the controller from the switches. The coordination signal is the flow table: the controller writes rules, the switch executes them.
Protocol stack decomposition. The data plane decomposes vertically into layers — physical, link, network, transport, application. Each layer solves a subproblem and hides its complexity behind an interface. The socket API, the IP datagram service, the Ethernet frame format — these are the coordination signals between layers. This classical layering maps directly to vertical decomposition in the NUM framework.
Endpoint decomposition. The network divides into endpoints and interior nodes. The end-to-end argument (Saltzer et al. 1984) is a decomposition principle: place functionality at endpoints because they have the best information about application requirements. In NUM terms, this is choosing dual decomposition — endpoints decide their own rates based on price signals (loss, delay) — over primal decomposition, where the network centrally allocates rates. TCP is the canonical result: all intelligence at the edges, stateless forwarding in the middle.
Traffic decomposition. Aggregate traffic at a link can be decomposed into individual flows, classes of service, or left undifferentiated. Per-flow fair queuing gives each flow its own subproblem; FIFO is the degenerate case where all packets compete in a single queue. The choice determines who bears the cost of congestion: with FIFO, a single aggressive flow degrades all others. With per-flow queuing, the aggressive flow fills only its own queue.
Temporal decomposition. Different subsystems operate at vastly different timescales: routing reconverges over minutes, congestion control adapts over RTTs (milliseconds to hundreds of milliseconds), scheduling allocates per time slot, and physical-layer modulation operates per symbol (microseconds). This separation is what makes the overall system tractable. Faster subproblems treat slower ones as constants; slower subproblems treat faster ones as averages. When timescales collide — a routing change mid-flow, a congestion event during handover — the decomposition breaks down and cross-layer effects dominate.
2.6.2 Decomposition Produces Subsystems; Invariants Analyze Them
Each decomposition yields subsystems. Each subsystem answers the four invariants: what state does it maintain? What time semantics does it use? How does it coordinate with peers? What interfaces does it expose? The framework in this chapter — invariants, principles, dependency graphs — is the analytical toolkit applied within each subsystem.
But decomposition also reveals the connections between subsystems. The Interface invariant is precisely the coordination signal between subproblems. When we analyze queue management (Chapter 5), the loss signal it sends to transport is an interface — a dual variable coupling two subproblems. When we analyze system composition (Chapter 6), we are studying what happens when these coordination signals degrade.
The six systems studied in this book are not arbitrary categories. They are the functional subproblems that emerge from decomposing the Internet’s resource allocation problem. Understanding why they are separate — and how they are coupled — is as important as understanding how each one works internally.
2.7 The Six Components and the Structure of the Book
To ground the framework, the rest of the book studies six components that collectively span the design space of Internet infrastructure. Each component answers a distinct engineering question:
| Component | Engineering Question | Anchor | Chapters |
|---|---|---|---|
| Medium Access | How do I share a transmission medium fairly among competing transmitters? | Medium physics (shared vs. dedicated, licensed vs. unlicensed) | 2–3 |
| Transport | How do I deliver data reliably and efficiently across a path I don’t control? | IP interface (unreliable datagrams, admin decentralization) | 4 |
| Queue Management | What do I do when packets arrive faster than I can forward them? | Finite buffer at a shared link | 5 |
| Multimedia Applications | How do I deliver time-sensitive content over a best-effort network? | Human perceptual requirements | 7 |
| Network Management | How do I allocate resources and enforce policy across heterogeneous demands? | Need for visibility into opaque systems | 8 |
| Measurement | How do I observe what’s actually happening in a system I can’t fully instrument? | Information asymmetry between ISPs and observers | 8 |
A real networked system — a video call over WiFi, a web page load over LTE — instantiates several components simultaneously. The first four are operational components — they move, manage, or adapt data. The last two are observational components — they instrument and act on the operational systems. Each component’s output shapes the next component’s environment: transport generates the arrival rate that queue management must handle; queue management produces the signals that transport interprets as congestion feedback; multimedia applications receive whatever throughput and delay the components below deliver.
These components are not new territory — they are familiar ground examined through a new lens. You have already seen medium access in Ethernet’s CSMA/CD, where stations listen before transmitting and back off after collisions. You have already seen transport in TCP, where endpoints negotiate reliability through sequence numbers and acknowledgments. You have already seen queue management implicitly — every router you studied in an introductory course has buffers, and what happens when those buffers overflow is the packet loss that TCP interprets as congestion. You have already seen multimedia applications when you studied streaming video or VoIP and noticed that retransmitting a lost video frame 500ms late is worse than skipping it. This course examines these same components through the invariants lens, revealing design choices you may not have noticed: why does Ethernet use exponential backoff rather than a fixed retry interval? Why does TCP halve its window on loss rather than reduce it by a fixed amount? Why do routers use FIFO queuing rather than per-flow scheduling? Each of these choices is an answer to one or more invariants, shaped by the anchor constraints that component faces.
The book progresses through three stages of analysis. First, each component independently — anchor, invariants, principles, dependency graph (Chapters 2–5, 7–8). Then, how components compose — what happens when one component’s output degrades another component’s measurement signal (Chapter 6). Finally, how component boundaries shift when the decomposition itself is redesigned — L4S redefining the transport↔︎queue interface, ECN adding richer coordination signals, QUIC escaping middlebox ossification by moving transport over UDP (Chapter 6, later sections).
2.8 Objectives, Failure, and Meta-Constraints
Different systems optimize different objectives, which is why the same invariant answer can be good in one setting and bad in another. The four invariants describe structural anatomy — but every system also has a purpose. Objectives — throughput maximization, fairness, bounded latency, energy efficiency — define what “answering well” means. Failure semantics — what happens when state is stale, measurement is wrong, coordination breaks, or interfaces are violated — define the cost of answering badly.
Objectives and failure are not fifth and sixth invariants. They are the evaluation criteria applied to the four invariant answers. A dependency graph without objectives is a description. A dependency graph with objectives is a design argument. This framework supplies the structural vocabulary; objectives and failure supply the evaluative vocabulary. Both are necessary.
Meta-constraints shape all invariant answers from outside the technical design space. They explain why the best technical design does not always win:
- Incremental deployability. A protocol that requires coordinated upgrades across the Internet will not be deployed, regardless of its technical merits. TCP survived and evolved because it is endpoint-only — a new congestion control algorithm (Cubic, BBR) can be deployed on a single server without any router upgrade. DNS survived because new record types (AAAA for IPv6, TLSA for DANE, CAA for certificate authority authorization) do not require resolver upgrades — resolvers that do not understand a record type simply pass it through. DHCP survived because its option space is extensible — new options (vendor-specific information, SIP server addresses, NTP server addresses) can be added without breaking old clients, which simply ignore options they do not recognize. The common pattern: all three protocols were designed with extension points that allow incremental change without coordinated deployment.
- Backward compatibility. Every successful protocol must coexist with its predecessors. IPv6 must coexist with IPv4 on the same Internet — dual-stack deployment, tunneling (6to4, Teredo), and translation (NAT64) mechanisms exist solely because of this constraint. DNS’s EDNS0 extension is explicitly backward-compatible: a resolver that does not understand the OPT pseudo-record ignores it and responds within the traditional 512-byte UDP limit. TCP options are negotiated during the three-way handshake — the SYN segment advertises supported options (window scaling, SACK, timestamps), and if the peer does not recognize an option, it is simply not used. Unknown options are ignored, not rejected. This design allows TCP to evolve without breaking connections to older implementations.
- Administrative boundaries. The Internet is not a single network — it is a federation of tens of thousands of independently administered networks. This reality constrains every coordination model. BGP exists because no single routing protocol can span administrative boundaries where operators have conflicting economic incentives and refuse to share internal topology. DHCP is scoped to individual subnets because each network administers its own address space — a DHCP server on one organization’s network has no authority over addresses on another’s. DNS’s hierarchical delegation mirrors administrative hierarchy: ICANN controls the root, registries control TLDs, and organizations control their own zones. No protocol requiring global state has ever been deployed at Internet scale, because no entity has the authority to maintain it.
- Hardware economics. Router hardware constrains what processing can happen per packet. Simple FIFO queuing dominates because it requires no per-flow state and no per-packet computation — alternatives like per-flow fair queuing are more expensive to implement at line rate.
- Standardization history. Some separations reflect organizational boundaries (who was in the room at the IETF), not analytical independence. The TCP/UDP split at the transport layer is as much historical artifact as principled disaggregation.
2.9 What the Framework Predicts
A framework that only describes is a taxonomy. This framework makes concrete predictions — testable claims that follow from tracing the dependency graph.
Prediction 1: Direct coupling + distributed coordination → destructive scaling. Any system where agents share environment state, act through direct coupling (my action destroys your action, not merely delays it), and coordinate without central authority will face destructive scaling as agent count grows. Ethernet’s original shared bus confirmed this — collisions wasted bandwidth destructively as station count grew, which is why Ethernet moved to switches. Aloha confirmed it even earlier [Abramson, 1970]. Wireless networks, studied in Chapter 2, confirm it in a setting where switching to dedicated links is physically impossible. The prediction generalizes: identify these three properties in a new system, and you can predict the failure mode before building it.
Concrete test case: WiFi (802.11). Apply the prediction’s three criteria. Shared environment state? Yes — all stations share the same radio channel. Direct coupling? Yes — simultaneous transmissions on the same frequency produce destructive interference; both frames are corrupted, not merely delayed. Distributed coordination? Yes — each station runs CSMA/CA independently, deciding when to transmit based on local carrier sensing. The prediction says: this system will not scale. Does it? WiFi performance degrades severely as station count grows — in a dense lecture hall with 200 laptops, throughput per station collapses because collision probability rises superlinearly. This is precisely why 802.11ax (WiFi 6) introduced OFDMA and BSS Coloring — mechanisms that move toward centralized scheduling (the access point coordinates who transmits when), relaxing the distributed coordination constraint that Prediction 1 identifies as the scaling bottleneck.
Prediction 2: When environmental constraints shift, the invariant under greatest pressure tends to restructure first — and the change propagates through the dependency graph. TCP’s evolution is consistent with this. Bandwidth scaling pressured the time/state invariants → newer TCP variants restructured their state model — Cubic uses a cubic growth function instead of linear [Ha et al., 2008]. Datacenter RTT collapse pressured measurement quality → datacenter TCP variants restructured the measurement signal — using explicit router marks instead of waiting for packet loss [Alizadeh et al., 2010]. Hardware programmability pressured the Time invariant → offloading transport to specialized hardware restructured where decisions execute. The test of this prediction is whether it can identify the restructuring point before seeing the redesign, not rationalize it after.
Concrete test case: TCP to BBR. What environmental constraint shifted? The rise of high bandwidth-delay product (BDP) paths — long-haul fiber links with bandwidths of 10+ Gbps and RTTs of 50–200ms. On these paths, the BDP (bandwidth times RTT) can exceed 100 MB, meaning the pipe can hold enormous amounts of data in flight. Traditional loss-based TCP (Reno, Cubic) fills buffers until a packet drops, then halves the window. On high-BDP paths with large buffers, this means TCP overshoots massively before getting a signal — the State invariant is under pressure because the measurement signal (loss) arrives far too late. BBR restructures State first: instead of using loss as the primary signal, BBR builds an explicit model of the path — estimating bottleneck bandwidth from delivery rate measurements and minimum RTT from round-trip time observations. The State restructuring (from implicit loss-based belief to explicit path model) propagates through the dependency graph: Time changes (BBR uses a windowed min-RTT estimator rather than Jacobson’s SRTT), and the closed-loop dynamics change (BBR paces packets at the estimated bottleneck rate rather than probing via window inflation). Coordination remains distributed — BBR is still endpoint-only — confirming that the pressure was on State, not Coordination.
Prediction 3: Relaxing interface or coordination constraints enables tighter belief-environment coupling, which enables operation at tighter margins. Congestion control inside a single datacenter operates at higher utilization and lower latency than TCP across the wide-area Internet precisely because administrative consolidation relaxes coordination constraints and enables richer measurement signals. Any system moving from multi-administrative to single-administrative control follows this trajectory.
Concrete test case: DCTCP in the datacenter. A datacenter is a single administrative domain — one operator controls all switches, all servers, and all software. This relaxes the coordination constraint: the operator can configure ECN marking on every switch, guaranteeing that all routers will mark packets with Congestion Experienced (CE) when queue occupancy exceeds a threshold. The operator can also guarantee that all senders run DCTCP, eliminating the fairness problem of mixing DCTCP with legacy TCP. With these constraints relaxed, DCTCP achieves tighter belief-environment coupling: ECN marks arrive before any packet is dropped, providing an earlier and richer congestion signal than loss. The result is operation at tighter margins — DCTCP maintains queue occupancy near zero (target: a few packets) while keeping utilization above 90%, compared to wide-area TCP which tolerates hundreds of milliseconds of queuing delay. The single-admin constraint is what makes this possible: on the open Internet, you cannot mandate ECN support on every router or DCTCP on every sender.
These predictions are intended to be falsifiable. A system with direct coupling and distributed coordination that scales gracefully would challenge Prediction 1. An environmental shift that restructures an invariant other than the most-pressured one would challenge Prediction 2. Falsifiability gives the framework empirical teeth — it constrains what the framework allows, which means the framework can be wrong.
2.10 The Method
Given any system — one you are designing, studying, or reviewing — follow five steps:
Step 1. Identify the anchor. What constraints are inherited from physics, architecture, or deployment history? These are the hardest to change and the most consequential for all other choices.
Step 2. Answer the four invariants. For each: what state exists (environment, measurement, belief)? How is time handled? Who coordinates? What are the interfaces? Be specific — name the actual state variables, the actual signals, the actual decision rules.
Step 3. Trace the dependency graph. Which invariant answers constrain which others? How does the anchor narrow the feasible space? Where do principles (disaggregation, closed-loop, decision placement) explain the choices?
Step 4. Evaluate the closed-loop dynamics. Given these invariant answers, will the system converge? What happens when the measurement signal degrades? When the environment changes? Where are the failure modes?
Step 5. Check the meta-constraints. Is this design deployable? Backward-compatible? Economically viable? What standardization or organizational factors shape the design beyond technical merit?
A useful question for reviewing any systems paper: which invariant does this system fundamentally improve, and how does that change ripple through the dependency graph?
Worked example: applying the five steps to DNS. To see the method in action, apply all five steps to a protocol you already know.
Step 1. Identify the anchor. The global namespace is too large for a single server (billions of records), and the Internet’s administrative decentralization means no single entity has authority over all names. These two constraints — scale and administrative fragmentation — anchor the design.
Step 2. Answer the four invariants. State: hierarchically distributed. The root zone stores pointers to TLD servers; TLD servers store pointers to authoritative servers; authoritative servers store the actual records. Caching resolvers at every level store copies with TTL-bounded lifetimes. The environment state is the current set of name-to-address mappings. The measurement signal is the query-response exchange. The belief is the cached record — which may be stale. Time: prescribed by the authority. Each record carries a TTL set by the zone administrator. Resolvers count down the TTL and re-query when it expires. The authority controls the freshness-load tradeoff unilaterally. Coordination: hierarchical delegation. Each level delegates authority to the level below. The root delegates .com to Verisign; Verisign delegates google.com to Google. No two authorities need to agree on anything outside their own zone. Interface: UDP on port 53, with TCP fallback for large responses. EDNS0 extends the UDP size limit. DoH and DoT encrypt queries for privacy.
Step 3. Trace the dependency graph. The anchor (scale + decentralization) forces hierarchical coordination — no single server can handle all queries, and no single entity has universal authority. Hierarchical coordination forces distributed state — each level caches independently, with no central cache. Distributed cached state forces TTL-based time — without a mechanism to invalidate caches, expiration is the only way to bound staleness. The interface (UDP) reflects the query-response pattern and latency sensitivity of name resolution.
Step 4. Evaluate the closed-loop dynamics. The feedback loop is: query → cache → serve from cache → TTL expires → re-query. The loop period is the TTL. When the loop is too slow (TTL too long), resolvers serve stale records — users reach decommissioned servers. When the loop is too fast (TTL too short), authoritative servers face query storms. The loop has no adaptive gain — the TTL is a static parameter. The system converges to correct state after one TTL period, but during that period, inconsistency is tolerated. Global synchronization is possible: if a popular record’s TTL expires at roughly the same time across many resolvers, the authoritative server faces a coordinated burst.
Step 5. Check the meta-constraints. DNS is incrementally deployable — new record types can be added without upgrading resolvers (resolvers that do not understand a type pass it through). EDNS0 is backward-compatible — old servers ignore the OPT record. Administrative boundaries are respected — each organization controls its own zone. The root zone’s governance (ICANN) is a political meta-constraint that shapes the entire hierarchy. DNS-over-HTTPS raises a new meta-constraint: it shifts trust from the network path to the resolver operator, a change with political and regulatory dimensions that the technical design alone does not resolve.
2.11 Generative Exercise: QUIC and the Interface Renegotiation
QUIC moves transport over UDP to avoid middlebox ossification [Langley et al., 2017]. The Interface invariant is under pressure: middleboxes inspect and modify TCP headers, making new transport features undeployable. QUIC renegotiates the interface — and the framework predicts the cascade:
- Interface: TCP → UDP encapsulation. Encrypted headers are opaque to middleboxes.
- State: Connection state moves to user-space. Connection migration (changing IP mid-session) becomes feasible — the identifier is a QUIC-level concept, not an IP 5-tuple.
- Time: 0-RTT handshake becomes possible. TCP’s three-way handshake is tied to kernel SYN processing; QUIC caches credentials and resumes immediately.
- Coordination: Unchanged — still distributed, still endpoint-driven.
The framework predicts that an interface renegotiation cascades through State and Time but leaves Coordination intact — because the environmental pressure was on interface ossification, not on who decides.
Exercise for the reader: HTTP/3 runs exclusively over QUIC. What invariant answers change at the application layer compared to HTTP/2 over TCP? Trace the dependency graph.
2.11.1 In-Class Discussion Exercises
These exercises are designed for group work during class. Each can be discussed productively in 10–15 minutes.
Exercise 1: What if DNS had no caching? Suppose every DNS query had to traverse the full hierarchy — root server, TLD server, authoritative server — every time. Trace the impact on each invariant. How does the State invariant change? (No local belief — every query produces a fresh answer, but load on authoritative servers increases by orders of magnitude.) How does the Time invariant change? (No TTL, no staleness — but no buffering against server unreachability either.) How does the Coordination invariant change? (The hierarchy still delegates, but every resolver must contact every level for every query — the hierarchy bears the full query load.) What would happen to root server traffic? (The 13 root server clusters currently handle roughly 10,000 queries per second; without caching, they would face billions.)
Exercise 2: What if DHCP leases never expired? Suppose the server grants an address permanently — no lease duration, no renewal, no reclamation. What problems arise? (Address exhaustion: departed clients never return their addresses. A coffee shop with a /24 subnet — 254 usable addresses — would exhaust its pool in a day. The server would have no mechanism to distinguish a client that left five minutes ago from one that will return tomorrow.) Which invariant is most affected? (Time — the lease duration is the mechanism that couples address allocation to actual usage. Without it, the state invariant degrades: the server’s belief about who holds each address diverges from reality, with no measurement signal to correct it.)
Exercise 3: Compare ARP and DHCP through the four invariants. Both protocols operate at bootstrap time on a LAN. Both use broadcast. But they solve different problems and make different structural choices. ARP maps IP addresses to MAC addresses; DHCP assigns IP addresses. Apply the four invariants to ARP: State (distributed — every host caches its own ARP table), Time (prescribed timeout, typically 20 minutes), Coordination (none — any host can claim any mapping, which is why ARP spoofing is trivially easy), Interface (broadcast query, unicast reply). Why is ARP’s coordination model so different from DHCP’s? (ARP does not allocate a scarce resource — MAC addresses are pre-assigned by manufacturers — so the uniqueness constraint that forces DHCP’s centralization does not apply.)
Exercise 4: Explicit rate feedback. TCP uses loss as a congestion signal. Suppose routers could send explicit rate feedback — “you may send at 50 Mbps on this path.” Which invariants change? (State changes: the sender receives an authoritative measurement signal instead of inferring capacity from loss. Time changes: the feedback loop tightens — the sender learns the available rate in one RTT rather than probing over many RTTs. Coordination changes: the router becomes an active participant in rate allocation, shifting from distributed to hybrid coordination.) What are the deployment obstacles? (Every router on the path must participate — a single legacy router that does not send rate feedback breaks the mechanism. This is the incremental deployability meta-constraint.)
2.11.2 Independent Reading Exercises
These exercises are designed for deeper self-study. Each requires applying the framework to a system not covered in class.
Exercise 5: Apply the five-step method to HTTP/2. Identify the anchor constraint that motivated HTTP/2’s design (head-of-line blocking in HTTP/1.1, where a single slow response blocks all subsequent responses on the same connection). Answer the four invariants: what state does HTTP/2 maintain (stream multiplexing state, HPACK header compression state, flow control windows per stream)? How is time handled (stream priorities, flow control window updates)? Who coordinates (client and server negotiate settings; server can push resources)? What is the interface (binary framing layer over a single TCP connection)? Trace the dependency graph from the anchor. Evaluate the closed-loop dynamics — in particular, how does TCP’s head-of-line blocking at the transport layer undermine HTTP/2’s stream multiplexing at the application layer?
Exercise 6: OSPF vs. BGP — why different coordination choices? Both OSPF and BGP solve routing. Apply the framework to explain why they make radically different coordination choices. Start with the anchor: OSPF operates within a single administrative domain (one operator controls all routers); BGP operates across administrative boundaries (each AS has independent, often competing, interests). Trace how this anchor difference produces different answers for every invariant. Why does OSPF share complete topology information while BGP shares only reachability? Why does OSPF converge to a globally consistent view while BGP may never converge? Why does OSPF use link-state flooding while BGP uses path-vector announcements?
Exercise 7: Design a configuration distribution protocol. You need to distribute configuration files to 1,000 servers in a datacenter. The files change roughly once per hour. Use the framework: identify your anchor constraint (single administrative domain, low latency requirement, file consistency across all servers). Answer the four invariants — who holds the authoritative copy? How do servers learn about updates? How is time handled (polling interval vs. push notification)? What interface do servers use to retrieve files? Justify each choice by tracing it to the anchor. Compare your design to a pull-based approach (servers poll periodically) and a push-based approach (a central server notifies all servers on every change). Which invariant answers differ, and why?
Exercise 8: QUIC and middlebox dependency graphs. QUIC encrypts its transport headers, making them opaque to middleboxes (firewalls, NATs, load balancers, traffic shapers) that previously inspected TCP headers to make forwarding, filtering, or shaping decisions. Predict how this changes the dependency graph for these middleboxes. What state did middleboxes maintain about TCP connections (sequence numbers for stateful firewalling, connection tracking for NAT)? How did they obtain that state (passive observation of TCP headers)? What happens when the measurement signal disappears (encrypted QUIC headers)? How must the middlebox’s coordination model change? Consider specifically: how does a stateful firewall track QUIC connections without seeing transport headers? How does a load balancer distribute QUIC connections across backend servers without inspecting the transport layer?
2.12 References
- Abramson, N. (1970). “The ALOHA System — Another Alternative for Computer Communications.” Proc. AFIPS Fall Joint Computer Conference.
- Alizadeh, M. et al. (2010). “Data Center TCP (DCTCP).” Proc. ACM SIGCOMM.
- Cardwell, N., Cheng, Y., Gunn, C.S., Yeganeh, S.H., and Jacobson, V. (2017). “BBR: Congestion-Based Congestion Control.” ACM Queue, 14(5).
- Clark, D. (1988). “The Design Philosophy of the DARPA Internet Protocols.” Proc. ACM SIGCOMM.
- Deering, S. (1998). “Watching the Waist of the Protocol Hourglass.” Keynote, IETF 43.
- Ha, S., Rhee, I., and Xu, L. (2008). “CUBIC: A New TCP-Friendly High-Speed TCP Variant.” ACM SIGOPS Operating Systems Review, 42(5).
- Jacobson, V. (1988). “Congestion Avoidance and Control.” Proc. ACM SIGCOMM.
- Lamport, L. (1978). “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM, 21(7).
- Langley, A. et al. (2017). “The QUIC Transport Protocol: Design and Internet-Scale Deployment.” Proc. ACM SIGCOMM.
- McKeown, N. et al. (2008). “OpenFlow: Enabling Innovation in Campus Networks.” ACM SIGCOMM Computer Communication Review, 38(2).
- Nichols, K. and Jacobson, V. (2012). “Controlling Queue Delay.” ACM Queue, 10(5).
- Ramakrishnan, K., Floyd, S., and Black, D. (2001). “The Addition of Explicit Congestion Notification (ECN) to IP.” RFC 3168.
- Saltzer, J.H., Reed, D.P., and Clark, D.D. (1984). “End-to-End Arguments in System Design.” ACM Trans. Computer Systems, 2(4).
- Shannon, C.E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27(3).
- Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.
- 3GPP (2018). “Study on New Radio Access Technology.” 3GPP TR 38.912.
This chapter is part of “A First-Principles Approach to Networked Systems” by Arpit Gupta, UC Santa Barbara, licensed under CC BY-NC-SA 4.0.