Design Principles in Action

From Nothing to a Routable Network

Arpit Gupta

2026-04-07

Promise: from diagnosis to design

Lecture 2 gave you a diagnostic tool — four invariants that reveal why a system is built the way it is.

Today: the design toolkit — three principles that explain how solutions get built.

Lecture Tool Capability
Lecture 2 Four invariants Diagnose — trace any system’s dependency chain
Lecture 3 Three design principles Construct — explain how pioneers built solutions
Lecture 4 Constraint-shift prediction Predict — what breaks when the environment changes

One backstory: from zero infrastructure to a routable, nameable Internet — and the three principles that built it.

What Lecture 2 gave you — and what’s missing

Lecture 2 traced TCP’s dependency chain AND showed how pioneers designed solutions:

  • Jacobson separated cwnd from the receiver window (two different beliefs about capacity)
  • Jacobson’s AIMD is a feedback loop: send → observe ACKs → adjust cwnd → send again
  • The FSM is distributed — forced by the constraint that no central entity controls the Internet

You already saw the three principles at work — but we never named them or asked: do these patterns recur in other systems?

Today: the same three patterns, applied to a completely different system (routing and naming). If the same patterns solve different problems under different constraints, they are principles — not TCP-specific tricks.

The Backstory: Building from Zero

It’s 1964. The network must survive attack.

Before any routing algorithm, before any IMP — the constraint came first.

“Let us consider the synthesis of a communication network which will allow several hundred major communications stations to talk with one another after an enemy attack.” — Baran, 1964

Baran’s proof: a distributed mesh with just 3–4 links per node survives heavy attack. A centralized hub (star topology) does not — destroy the center, destroy the network.

This is the binding constraint that shapes everything: the network must function without any central point of control.

Two minutes with a partner: Evaluate four topologies — star, ring, full mesh, irregular mesh — on three dimensions:

Topology Survivability Cost (links for N nodes) Routing complexity
Star ? ? ?
Ring ? ? ?
Full mesh ? ? ?
Irregular mesh ? ? ?

The ARPANET: what the pioneers built

4 IMPs, irregular mesh, ~2 links per node.

Each IMP is a Honeywell DDP-516 — a minicomputer with 12 KB of RAM. In 1969, protocol layers did not exist (OSI arrived in 1980). The IMP was router + modem + host interface, all in one box: “the machine that moves packets.”

“Economic considerations mitigate against a fully connected configuration.” — Heart et al., 1970

Because the topology is sparse, packets must traverse intermediate nodes. This creates the routing problem.

The monolithic IMP: everything coupled

Imagine a naive design: every time a packet arrives, the IMP recomputes the route from scratch — queries neighbors, runs the algorithm, then forwards.

What goes wrong?

  • A packet arrives every ~0.3 ms (at 50 Kbps with 200-byte packets)
  • Route computation takes tens of milliseconds (exchange tables, run Bellman-Ford)
  • While computing, the IMP stops forwarding — packets queue and drop

The problem: two tasks with different timescales are coupled in one machine.

Task Speed Frequency
Forward a packet Must complete in microseconds Every packet (~0.3 ms)
Recompute routes Takes milliseconds Only when topology changes

Two minutes: How would you fix this? What would you separate?

Heart’s solution — and the first design principle

Heart separated the IMP’s work into two independent processes:

Process What it does When it runs
Forwarding Table lookup → pick output port Every packet (microseconds)
Route computation Exchange tables → Bellman-Ford Every 128 ms (background)

The forwarding table is the interface between them. Forwarding reads it; route computation writes it. Neither waits for the other.

This pattern — separating concerns that operate at different timescales so each can evolve independently — is called disaggregation.

It creates an interface (the table). Change the routing algorithm → forwarding is unaffected. Speed up forwarding hardware → routing is unaffected. But: the table can be stale. That’s the cost of every disaggregation — the interface can degrade.

The routing algorithm: how does the table get built?

“Any dependency between one IMP and another would merely broaden the area jeopardized by one IMP’s failure.” — Heart et al., 1970

Baran’s constraint rules out a central route computer. Each IMP must build its own table. How?

Quick recall: does anyone remember the Bellman-Ford equation?

\[d(v) = \min_{u \in \text{neighbors}(v)} \left[ c(v,u) + d_u(v) \right]\]

“My cost to X = minimum over all neighbors of (link cost to neighbor + neighbor’s advertised cost to X).”

Heart’s insight: run Bellman-Ford live, every 128 ms. Each IMP sends its full distance table to every neighbor. Each neighbor recomputes. After enough rounds, every IMP converges to the shortest path.

DVR in action: watch the tables converge

Round 0: Each IMP knows only its direct neighbors
┌──────────┬─────────┬─────────┬─────────┐
│          │ to UCLA │ to UCSB │ to Utah │
├──────────┼─────────┼─────────┼─────────┤
│ SRI      │ 1       │ 1       │ 1       │
│ UCLA     │ 0       │ 1       │ ∞       │
│ UCSB     │ 1       │ 0       │ ∞       │
│ Utah     │ ∞       │ ∞       │ 0       │
└──────────┴─────────┴─────────┴─────────┘
Round 1: Each IMP receives neighbors' tables, recomputes
┌──────────┬─────────┬─────────┬─────────┐
│          │ to UCLA │ to UCSB │ to Utah │
├──────────┼─────────┼─────────┼─────────┤
│ SRI      │ 1       │ 1       │ 1       │
│ UCLA     │ 0       │ 1       │ 2 (via SRI) │
│ UCSB     │ 1       │ 0       │ 2 (via SRI) │
│ Utah     │ 2 (via SRI) │ 2 (via SRI) │ 0  │
└──────────┴─────────┴─────────┴─────────┘

Converged in one round. On a 4-node network, this is elegant. On a 60-node network in 1978, it broke catastrophically.

What does each IMP measure as “cost”?

The Bellman-Ford equation uses a “cost” — but cost of what? Each IMP needs a number for each link.

Two minutes: If you had 12 KB of RAM and needed a single number to represent “how good is this link?” — what would you measure?

Queue length: honest signal, terrible proxy

Heart’s team measured output queue length as a proxy for link delay.

Why it seems reasonable: long queue → packets wait longer → high delay → avoid this link.

Why it fails:

  • Light load: all queues empty, all costs ≈ 0 — the IMP has no basis to prefer any path
  • Heavy load: “best” path fills → cost rises → traffic shifts to alternate → original drains → cost drops → traffic shifts back → oscillation

The E-M-B diagnosis: Environment (true link quality) → Measurement (queue snapshot — honest but misleading) → Belief (distance vector — built from misleading proxies).

Compare to TCP’s bufferbloat from Lecture 2: same structure, different gap. TCP’s measurement is delayed (ACKs arrive late). DV’s measurement is a bad proxy (queue length ≠ delay). Both produce belief ≠ environment.

What happens when SRI-Utah fails?

Back to our topology. The link SRI ↔︎ Utah fails. Everyone’s goal: reach Utah.

SRI detects the failure. Sets cost to Utah = ∞. Correct.

But UCLA and UCSB still have old tables: “Utah: cost 2, via SRI.”

Two minutes: Trace what happens next. UCLA asks UCSB “how far to Utah?” — UCSB’s table says cost 2. What does UCLA conclude? What does UCSB conclude next round? Where do packets go?

Count-to-infinity: the echo

Round 1: SRI tells neighbors "Utah = ∞"
  UCLA: SRI says ∞. But UCSB still says 2. → UCLA records: Utah via UCSB, cost 3.
  UCSB: SRI says ∞. But UCLA still says 2. → UCSB records: Utah via UCLA, cost 3.

Round 2: UCLA and UCSB exchange their NEW tables
  UCLA: UCSB now says 3. → UCLA records: Utah via UCSB, cost 4.
  UCSB: UCLA now says 3. → UCSB records: Utah via UCLA, cost 4.

Round 3: costs climb... 5, 6, 7, 8... → ∞

Packets from UCLA bounce: UCLA → UCSB → UCLA → UCSB → … forever.

In 1969, there was no TTL — no mechanism to kill looping packets. IMPs used “tenacious forwarding”: hold the packet until the next hop acknowledges. Looping packets consumed IMP memory and link bandwidth until the routing tables eventually converged — or didn’t.

TTL was introduced with IP in 1981 (RFC 791) specifically to bound packet lifetime in a best-effort network.

Why did this happen? The root cause.

The root cause: DV shares computed distances (belief), not raw observations (measurement).

UCSB’s advertisement “cost 2 to Utah” is UCSB’s belief — and that belief was built from SRI’s old advertisement, which routes through the now-broken link. But UCLA has no way to know this, because DV hides the path.

Which E-M-B layer broke? (Show fingers: 1 = environment, 2 = measurement, 3 = belief)

Measurement. The neighbor’s advertisement IS the measurement signal — and it carries stale belief disguised as fresh data. Your own outdated information echoes back through your neighbor.

Two minutes: How would you fix this? What information would you share instead?

McQuillan’s fix: share measurement, not belief

Replay the SRI-Utah failure — with McQuillan’s link-state routing.

Each IMP shares only what it directly measures about its own links:

SRI floods:  "My links: UCLA=up, UCSB=up, Utah=DOWN"
UCLA floods: "My links: SRI=up, UCSB=up"
UCSB floods: "My links: SRI=up, UCLA=up"
Utah floods: "My links: SRI=DOWN"

Every IMP now has the full topology graph. UCLA runs Dijkstra, sees Utah is unreachable, sets cost = ∞. One round. Correct answer. No echo.

Why no echo? UCSB’s advertisement is now “my link to SRI is up” — a direct measurement of its own wire. It is no longer “my cost to Utah is 2” — a processed belief hiding which path it uses. Measurement comes from the environment, not from another node’s belief.

This is disaggregation applied again: separate what you measure (link probes) from what you compute (shortest paths). Same principle Heart used for forwarding vs. routing — now applied to fix the measurement corruption that caused count-to-infinity.

If you’ve taken 176A, you know RIP and OSPF. Here’s how they connect to what we just learned:

Year System Algorithm What students know it as
1969 ARPANET DV (Heart) Bellman-Ford, live, 128ms The origin — what we just studied
1980 ARPANET LS (McQuillan) Flood link states + Dijkstra The fix we just saw
1988 RIP (RFC 1058) Same Bellman-Ford as 1969 — packaged for IP Heart’s DV, standardized. Still has count-to-infinity.
1989 OSPF (RFC 1131) Same link-state as 1980 — packaged for IP McQuillan’s LS, standardized. Replaced RIP.

RIP inherited the count-to-infinity problem. OSPF fixed it — the same way McQuillan did.

Decision placement: why distributed?

Both DV and link-state are fully distributed — every IMP computes independently.

We already know why: Baran’s constraint. The network must survive attack. A central route computer is a single point of failure.

But what does distributed placement cost?

Centralized Distributed
Failure tolerance Single point of failure Survives individual node loss
Convergence Instant (one computation) Multiple rounds (DV) or flooding delay (LS)
Consistency Always consistent Temporarily inconsistent during convergence
Scale Bounded by one machine Each node carries its own load

The constraint forced the placement. Survivability demanded distribution, even at the cost of convergence time and temporary inconsistency. When we study WiFi, a different constraint (licensed spectrum) forces the opposite placement — centralized scheduling.

Summary

Three principles — two systems — same patterns

Principle Routing (today) TCP (Lecture 2) The cost
Disaggregation Forwarding/routing (Heart) · Measurement/belief (McQuillan) cwnd/receiver window (Jacobson) Interface can degrade (stale table, stale cwnd)
Closed-loop DV 128ms: queue proxy → oscillation, stale belief → count-to-∞ AIMD: ACK delay → bufferbloat Loop is only as good as its measurement
Decision placement Distributed (survivability — Baran) Distributed (admin decentralization) Convergence time, temporary inconsistency

Different systems. Different constraints. Same three patterns. That’s what makes them principles.

Next lecture: the ARPANET was cooperative. Everyone shared topology. By 1989, the Internet had thousands of commercially sovereign ASes — AT&T, Sprint, MCI — who refused to share anything.

Write down your prediction: which invariant breaks first? How would you design routing when competitors refuse to share topology?

Thursday: BGP — and whether your prediction matches what the pioneers built.