From Nothing to a Routable Network
2026-04-07
Lecture 2 gave you a diagnostic tool — four invariants that reveal why a system is built the way it is.
Today: the design toolkit — three principles that explain how solutions get built.
| Lecture | Tool | Capability |
|---|---|---|
| Lecture 2 | Four invariants | Diagnose — trace any system’s dependency chain |
| Lecture 3 | Three design principles | Construct — explain how pioneers built solutions |
| Lecture 4 | Constraint-shift prediction | Predict — what breaks when the environment changes |
One backstory: from zero infrastructure to a routable, nameable Internet — and the three principles that built it.
Lecture 2 traced TCP’s dependency chain AND showed how pioneers designed solutions:
You already saw the three principles at work — but we never named them or asked: do these patterns recur in other systems?
Today: the same three patterns, applied to a completely different system (routing and naming). If the same patterns solve different problems under different constraints, they are principles — not TCP-specific tricks.
Before any routing algorithm, before any IMP — the constraint came first.
“Let us consider the synthesis of a communication network which will allow several hundred major communications stations to talk with one another after an enemy attack.” — Baran, 1964
Baran’s proof: a distributed mesh with just 3–4 links per node survives heavy attack. A centralized hub (star topology) does not — destroy the center, destroy the network.
This is the binding constraint that shapes everything: the network must function without any central point of control.
Two minutes with a partner: Evaluate four topologies — star, ring, full mesh, irregular mesh — on three dimensions:
| Topology | Survivability | Cost (links for N nodes) | Routing complexity |
|---|---|---|---|
| Star | ? | ? | ? |
| Ring | ? | ? | ? |
| Full mesh | ? | ? | ? |
| Irregular mesh | ? | ? | ? |
4 IMPs, irregular mesh, ~2 links per node.
Each IMP is a Honeywell DDP-516 — a minicomputer with 12 KB of RAM. In 1969, protocol layers did not exist (OSI arrived in 1980). The IMP was router + modem + host interface, all in one box: “the machine that moves packets.”
“Economic considerations mitigate against a fully connected configuration.” — Heart et al., 1970
Because the topology is sparse, packets must traverse intermediate nodes. This creates the routing problem.
Imagine a naive design: every time a packet arrives, the IMP recomputes the route from scratch — queries neighbors, runs the algorithm, then forwards.
What goes wrong?
The problem: two tasks with different timescales are coupled in one machine.
| Task | Speed | Frequency |
|---|---|---|
| Forward a packet | Must complete in microseconds | Every packet (~0.3 ms) |
| Recompute routes | Takes milliseconds | Only when topology changes |
Two minutes: How would you fix this? What would you separate?
Heart separated the IMP’s work into two independent processes:
| Process | What it does | When it runs |
|---|---|---|
| Forwarding | Table lookup → pick output port | Every packet (microseconds) |
| Route computation | Exchange tables → Bellman-Ford | Every 128 ms (background) |
The forwarding table is the interface between them. Forwarding reads it; route computation writes it. Neither waits for the other.
This pattern — separating concerns that operate at different timescales so each can evolve independently — is called disaggregation.
It creates an interface (the table). Change the routing algorithm → forwarding is unaffected. Speed up forwarding hardware → routing is unaffected. But: the table can be stale. That’s the cost of every disaggregation — the interface can degrade.
“Any dependency between one IMP and another would merely broaden the area jeopardized by one IMP’s failure.” — Heart et al., 1970
Baran’s constraint rules out a central route computer. Each IMP must build its own table. How?
Quick recall: does anyone remember the Bellman-Ford equation?
\[d(v) = \min_{u \in \text{neighbors}(v)} \left[ c(v,u) + d_u(v) \right]\]
“My cost to X = minimum over all neighbors of (link cost to neighbor + neighbor’s advertised cost to X).”
Heart’s insight: run Bellman-Ford live, every 128 ms. Each IMP sends its full distance table to every neighbor. Each neighbor recomputes. After enough rounds, every IMP converges to the shortest path.
Round 0: Each IMP knows only its direct neighbors
┌──────────┬─────────┬─────────┬─────────┐
│ │ to UCLA │ to UCSB │ to Utah │
├──────────┼─────────┼─────────┼─────────┤
│ SRI │ 1 │ 1 │ 1 │
│ UCLA │ 0 │ 1 │ ∞ │
│ UCSB │ 1 │ 0 │ ∞ │
│ Utah │ ∞ │ ∞ │ 0 │
└──────────┴─────────┴─────────┴─────────┘
Round 1: Each IMP receives neighbors' tables, recomputes
┌──────────┬─────────┬─────────┬─────────┐
│ │ to UCLA │ to UCSB │ to Utah │
├──────────┼─────────┼─────────┼─────────┤
│ SRI │ 1 │ 1 │ 1 │
│ UCLA │ 0 │ 1 │ 2 (via SRI) │
│ UCSB │ 1 │ 0 │ 2 (via SRI) │
│ Utah │ 2 (via SRI) │ 2 (via SRI) │ 0 │
└──────────┴─────────┴─────────┴─────────┘
Converged in one round. On a 4-node network, this is elegant. On a 60-node network in 1978, it broke catastrophically.
The Bellman-Ford equation uses a “cost” — but cost of what? Each IMP needs a number for each link.
Two minutes: If you had 12 KB of RAM and needed a single number to represent “how good is this link?” — what would you measure?
Heart’s team measured output queue length as a proxy for link delay.
Why it seems reasonable: long queue → packets wait longer → high delay → avoid this link.
Why it fails:
The E-M-B diagnosis: Environment (true link quality) → Measurement (queue snapshot — honest but misleading) → Belief (distance vector — built from misleading proxies).
Compare to TCP’s bufferbloat from Lecture 2: same structure, different gap. TCP’s measurement is delayed (ACKs arrive late). DV’s measurement is a bad proxy (queue length ≠ delay). Both produce belief ≠ environment.
Back to our topology. The link SRI ↔︎ Utah fails. Everyone’s goal: reach Utah.
SRI detects the failure. Sets cost to Utah = ∞. Correct.
But UCLA and UCSB still have old tables: “Utah: cost 2, via SRI.”
Two minutes: Trace what happens next. UCLA asks UCSB “how far to Utah?” — UCSB’s table says cost 2. What does UCLA conclude? What does UCSB conclude next round? Where do packets go?
Round 1: SRI tells neighbors "Utah = ∞"
UCLA: SRI says ∞. But UCSB still says 2. → UCLA records: Utah via UCSB, cost 3.
UCSB: SRI says ∞. But UCLA still says 2. → UCSB records: Utah via UCLA, cost 3.
Round 2: UCLA and UCSB exchange their NEW tables
UCLA: UCSB now says 3. → UCLA records: Utah via UCSB, cost 4.
UCSB: UCLA now says 3. → UCSB records: Utah via UCLA, cost 4.
Round 3: costs climb... 5, 6, 7, 8... → ∞
Packets from UCLA bounce: UCLA → UCSB → UCLA → UCSB → … forever.
In 1969, there was no TTL — no mechanism to kill looping packets. IMPs used “tenacious forwarding”: hold the packet until the next hop acknowledges. Looping packets consumed IMP memory and link bandwidth until the routing tables eventually converged — or didn’t.
TTL was introduced with IP in 1981 (RFC 791) specifically to bound packet lifetime in a best-effort network.
The root cause: DV shares computed distances (belief), not raw observations (measurement).
UCSB’s advertisement “cost 2 to Utah” is UCSB’s belief — and that belief was built from SRI’s old advertisement, which routes through the now-broken link. But UCLA has no way to know this, because DV hides the path.
Which E-M-B layer broke? (Show fingers: 1 = environment, 2 = measurement, 3 = belief)
Measurement. The neighbor’s advertisement IS the measurement signal — and it carries stale belief disguised as fresh data. Your own outdated information echoes back through your neighbor.
Two minutes: How would you fix this? What information would you share instead?
Replay the SRI-Utah failure — with McQuillan’s link-state routing.
Each IMP shares only what it directly measures about its own links:
SRI floods: "My links: UCLA=up, UCSB=up, Utah=DOWN"
UCLA floods: "My links: SRI=up, UCSB=up"
UCSB floods: "My links: SRI=up, UCLA=up"
Utah floods: "My links: SRI=DOWN"
Every IMP now has the full topology graph. UCLA runs Dijkstra, sees Utah is unreachable, sets cost = ∞. One round. Correct answer. No echo.
Why no echo? UCSB’s advertisement is now “my link to SRI is up” — a direct measurement of its own wire. It is no longer “my cost to Utah is 2” — a processed belief hiding which path it uses. Measurement comes from the environment, not from another node’s belief.
This is disaggregation applied again: separate what you measure (link probes) from what you compute (shortest paths). Same principle Heart used for forwarding vs. routing — now applied to fix the measurement corruption that caused count-to-infinity.
If you’ve taken 176A, you know RIP and OSPF. Here’s how they connect to what we just learned:
| Year | System | Algorithm | What students know it as |
|---|---|---|---|
| 1969 | ARPANET DV (Heart) | Bellman-Ford, live, 128ms | The origin — what we just studied |
| 1980 | ARPANET LS (McQuillan) | Flood link states + Dijkstra | The fix we just saw |
| 1988 | RIP (RFC 1058) | Same Bellman-Ford as 1969 — packaged for IP | Heart’s DV, standardized. Still has count-to-infinity. |
| 1989 | OSPF (RFC 1131) | Same link-state as 1980 — packaged for IP | McQuillan’s LS, standardized. Replaced RIP. |
RIP inherited the count-to-infinity problem. OSPF fixed it — the same way McQuillan did.
Both DV and link-state are fully distributed — every IMP computes independently.
We already know why: Baran’s constraint. The network must survive attack. A central route computer is a single point of failure.
But what does distributed placement cost?
| Centralized | Distributed | |
|---|---|---|
| Failure tolerance | Single point of failure | Survives individual node loss |
| Convergence | Instant (one computation) | Multiple rounds (DV) or flooding delay (LS) |
| Consistency | Always consistent | Temporarily inconsistent during convergence |
| Scale | Bounded by one machine | Each node carries its own load |
The constraint forced the placement. Survivability demanded distribution, even at the cost of convergence time and temporary inconsistency. When we study WiFi, a different constraint (licensed spectrum) forces the opposite placement — centralized scheduling.
| Principle | Routing (today) | TCP (Lecture 2) | The cost |
|---|---|---|---|
| Disaggregation | Forwarding/routing (Heart) · Measurement/belief (McQuillan) | cwnd/receiver window (Jacobson) | Interface can degrade (stale table, stale cwnd) |
| Closed-loop | DV 128ms: queue proxy → oscillation, stale belief → count-to-∞ | AIMD: ACK delay → bufferbloat | Loop is only as good as its measurement |
| Decision placement | Distributed (survivability — Baran) | Distributed (admin decentralization) | Convergence time, temporary inconsistency |
Different systems. Different constraints. Same three patterns. That’s what makes them principles.
Next lecture: the ARPANET was cooperative. Everyone shared topology. By 1989, the Internet had thousands of commercially sovereign ASes — AT&T, Sprint, MCI — who refused to share anything.
Write down your prediction: which invariant breaks first? How would you design routing when competitors refuse to share topology?
Thursday: BGP — and whether your prediction matches what the pioneers built.
Read: Ch 6, Acts 3–5