Chapter 2: First Principles for Networked Systems

Design Principles & The Dependency Graph

Arpit Gupta

2026-04-07

From Invariants to Principles

Invariants vs. Principles — The Distinction Matters

Invariants (last lecture)

What must be answered

  • State — What exists?
  • Time — When do things happen?
  • Coordination — Who decides?
  • Interface — How do components interact?

Every system answers these. No choice.

Principles (today)

How to answer well under constraints

  • Disaggregation
  • Closed-Loop Reasoning
  • Decision Placement

These are strategies — you can violate them, but you pay a cost.

Three Design Principles

What If DNS Also Did Routing?

Imagine a single system that resolves names, assigns IP addresses, AND computes routes. One protocol, one server, one database.

What goes wrong?

  • Change the routing algorithm → you break name resolution
  • Update the address plan → you break routing
  • Different organizations manage names (ICANN) vs. addresses (IANA) vs. routes (ISPs) — who controls the combined system?

The Internet disaggregates: DNS handles naming, IP handles addressing, BGP handles routing. Each evolves independently. IPv4→IPv6 doesn’t require renaming every domain.

Disaggregation — Separation of Concerns Under Constraints

Principle: Decompose a system into independent components that can evolve separately.

System What’s disaggregated Boundary it aligns with
DNS/IP/BGP Naming from addressing from routing Administrative (ICANN vs. IANA vs. ISPs)
Protocol stack Application / Transport / Network / Link / Physical Functional (each layer provides a service)
5G CU / DU / RU split Temporal (seconds vs. ms vs. sub-ms decisions)

Cost: Every interface between components adds overhead, latency, and complexity.

When NOT to disaggregate: When interface overhead dominates the benefit (e.g., kernel TCP vs. DPDK bypass — bypassing the kernel merges transport and application for speed).

Quick Quiz: Find the Feedback Loop

TCP sends a packet. An ACK comes back. TCP adjusts cwnd. TCP sends more packets.

That’s a feedback loop. Signal: ACK. Decision: adjust cwnd. Period: ~1 RTT.

Now: Is DNS caching a feedback loop?

Yes. Cache a record → serve it → TTL expires → re-query the authority → update cache. Signal: query-response. Period: TTL value.

Is DHCP leasing a feedback loop?

Yes. Allocate address → client uses it → lease expires → client renews or server reclaims. Signal: renewal request. Period: lease duration.

Every adaptive protocol is a feedback loop. The question is: will it converge? What happens when the signal is delayed or wrong?

Closed-Loop Reasoning — How Decisions Adapt Over Time

Protocol Signal Period Adaptive? Failure mode
TCP AIMD ACK arrivals / loss ~1 RTT Yes — cwnd adjusts continuously Oscillation (sawtooth), bufferbloat (delayed signal)
DNS TTL Query-response TTL value (min–hours) No — TTL is static Staleness (too long) or query storms (too short)
DHCP lease Renewal requests Lease duration No — lease is static Address exhaustion (too long) or churn (too short)

Key difference: TCP’s loop adapts (cwnd changes based on feedback). DNS and DHCP loops are static timers — they expire and refresh, but don’t adjust their period based on conditions.

A static timer can’t adapt to changing conditions. This is why DNS cache poisoning exploits the gap between TTL expiry and actual change.

Why Did WiFi Change Its Mind?

Original WiFi (802.11 DCF): every station decides independently when to transmit. Distributed.

WiFi 6 (802.11ax OFDMA): the access point schedules who transmits when. Centralized.

What changed? The protocol reversed its coordination model. Why?

The environment changed. A coffee shop in 2000: 5 laptops. A lecture hall in 2025: 200 devices.

Distributed contention with 200 devices → collision probability explodes → throughput collapses.

Dense deployments made distributed coordination destructive. The anchor shifted, and the decision placement had to follow.

Decision Placement — Where Control Authority Resides

Principle: Decide where decisions are made based on what information is available where.

Centralized placement

  • One entity has global view
  • Can optimize globally
  • Single point of failure
  • Doesn’t scale with agents

Examples: DHCP server, 4G eNodeB scheduler, SDN controller

Distributed placement

  • Each agent has local view only
  • No single point of failure
  • Scales with agent count
  • May never reach global optimum

Examples: TCP endpoints, WiFi DCF, BGP routers

The question isn’t “which is better” — it’s “what does your anchor constraint force?”

Three Principles: Summary

Disaggregation answers: how to divide the system. Closed-loop reasoning answers: how decisions adapt. Decision placement answers: where decisions are made.

The invariants tell you what to answer. The principles tell you how to answer well.

The Anchored Dependency Graph

What Is an Anchor?

An anchor is a constraint that is harder to change than the invariant answers it constrains.

Source Example What it forces
Physics Wireless medium is shared Carrier sensing, contention-based access
Legacy interface IP delivers unreliable datagrams TCP must infer congestion, build reliability
Admin boundaries No single entity controls the Internet Distributed coordination, no central scheduler
Hardware economics Commodity switches use FIFO queues Fair queuing is expensive → most routers don’t do it
Deployment reality Billions of devices speak TCP New transport must tunnel through UDP (QUIC)

What makes something an anchor: it is harder to change than the design choices it constrains.

The Dependency Graph

Anchor → constrains feasible invariant answersprinciples guide choices within constraints → choices produce closed-loop dynamics → dynamics produce emergent properties

Worked Example: TCP’s Dependency Graph

Anchor: IP provides unreliable datagrams + no single entity controls the Internet

Tracing the cascade:

  1. Admin decentralization → Coordination: distributed (no central scheduler possible)
  2. Distributed coordination → State: endpoint-local (each sender builds its own model)
  3. No network feedback → Time: inferred (RTT estimated from ACK arrivals)
  4. Reliability gap in IP → Interface: byte stream above, datagrams below

Principles at work:

  • Disaggregation: transport separated from network layer (IP handles routing, TCP handles reliability)
  • Closed-loop: AIMD is the feedback loop (send → measure → adjust)
  • Decision placement: endpoints decide rates (end-to-end argument)

Every design choice in TCP traces back to the anchor. Change the anchor → the design restructures.

What Is DNS’s Anchor?

Apply the method: what constraint is hardest to change for DNS, and how does it force the design?

Anchor: global namespace too large for one server + administrative fragmentation (no single entity owns all names)

Cascade:

  1. Scale + fragmentation → Coordination: hierarchical delegation (root → TLD → authoritative)
  2. Hierarchical delegation → State: distributed caching at every level
  3. No way to push invalidations → Time: prescribed TTL (authority sets expiry unilaterally)
  4. Latency-sensitive lookups → Interface: UDP port 53 (single request-response fits in one packet)

Same method, different anchor, entirely different design. That’s the framework.

The Five-Step Method

The Analytical Recipe

Given any system — one you are designing, studying, or reviewing:

Step Action What you produce
1 Identify the anchor The constraint hardest to change
2 Answer the four invariants Specific state variables, signals, decision rules
3 Trace the dependency graph Which answers constrain which others
4 Evaluate closed-loop dynamics Convergence? Failure modes? What if signal degrades?
5 Check meta-constraints Deployable? Backward-compatible? Economically viable?

A useful question for reviewing any systems paper: which invariant does this system fundamentally improve, and how does that change ripple through the dependency graph?

Objectives, Failure, and Meta-Constraints

A dependency graph without objectives is a description. With objectives, it’s a design argument.

Objectives: throughput, latency, fairness, reliability — what the system tries to optimize

Failure: what happens when the loop breaks — measurement signal degrades, environment changes faster than belief can track

Meta-constraints — forces beyond technical merit:

Meta-constraint Example
Incremental deployability ECN took decades — every router on the path must participate
Backward compatibility IPv6 adoption stalled for 20+ years
Administrative boundaries Can’t mandate DCTCP on the open Internet
Hardware economics Fair queuing is technically superior but FIFO dominates because it’s cheaper
Standardization politics IETF consensus process shapes what gets deployed

The Six Systems and the Course Roadmap

System Core Question Anchor Chapters
Medium Access How to share the transmission medium fairly? Medium physics (shared, destructive) Ch 3–4
Transport Deliver reliably across an uncontrolled path? IP interface (unreliable datagrams) Ch 2
Queue Management What to do when packets arrive faster than they leave? Finite buffer at bottleneck Ch 6
Multimedia Apps Deliver time-sensitive content over best-effort? Human perceptual time constraints Ch 8
Network Mgmt Allocate resources and enforce policy? Need for visibility across admin domains Ch 9
Measurement Observe what’s happening in an opaque system? Information asymmetry Ch 9

The framework is the same for all six. The anchor changes → the answers change → the design changes.

Exercise: Dependency Reasoning

In-Class Exercise: What If DNS Had No Caching?

Suppose every DNS query had to traverse the full hierarchy — root → TLD → authoritative — every single time. No local cache, no TTL, no stored answers.

Your task (5 minutes, work in pairs):

  1. Trace the impact on each invariant (State, Time, Coordination, Interface)
  2. Which invariant is under the most pressure?
  3. What would happen to root server traffic? (Hint: root servers currently handle ~10,000 queries/sec)

This is another midterm-style question. The midterm asks you to trace what-if scenarios through the dependency graph.

Exercise Discussion: DNS Without Caching

Invariant With caching Without caching
State Distributed cache at every resolver No local belief — every query produces a fresh answer
Time TTL-based expiry (minutes to hours) No TTL needed — but also no latency savings
Coordination Hierarchy + local autonomy (cache serves most queries) Hierarchy bears full query load at every level
Interface Same (UDP port 53) Same — but now every query hits the network

Under most pressure: State. Removing caching eliminates the local belief layer entirely. Every resolver must contact every level of the hierarchy for every query.

Scaling consequence: root servers go from ~10,000 queries/sec to billions. The caching IS the disaggregation that makes DNS work at scale.

Next Time: Predictions, QUIC, and the Midterm

The framework makes concrete, falsifiable predictions about how systems behave and fail:

  1. When does distributed coordination lead to destructive scaling?
  2. When a constraint shifts, which invariant restructures first?
  3. What happens when you relax coordination constraints?

Plus: the QUIC generative exercise — tracing a full interface renegotiation through the dependency graph.

Before Thursday: review the “What the Framework Predicts” section of Ch 2.