Course: CS176C — Advanced Topics in Internet Computing, Spring 2026
Instructor: Arpit Gupta, UC Santa Barbara
Date: April 7, 2026
Slides: Deployed slide deck
Pre-requisite: L2 (Four Invariants and the TCP Dependency Chain)


From diagnosis to design

Lecture 2 established the four invariants — State, Time, Coordination, Interface — as a diagnostic tool: given any networked system, trace its dependency chain and the reasons behind each design choice become visible. TCP’s chain ran Interface → Coordination → State → Time, and in tracing it students saw Jacobson separate the congestion window from the receiver window, saw AIMD as a feedback loop, saw why the finite-state machine had to be distributed. Those were design decisions, not just architectural facts. But Lecture 2 never named the patterns that produced them or asked whether those patterns recur elsewhere.

Today completes the toolkit. Three design principles — disaggregation, closed-loop reasoning, and decision placement — explain how solutions get constructed. A single backstory demonstrates all three: the journey from zero infrastructure to a routable, nameable Internet. If the same patterns that built TCP also built routing — different pioneers, different problems, different constraints — then they are structural principles, not TCP-specific tricks.

LectureToolCapability
Lecture 2Four invariantsDiagnose — trace any system’s dependency chain
Lecture 3Three design principlesConstruct — explain how pioneers built solutions
Lecture 4Constraint-shift predictionPredict — what breaks when the environment changes

The connection to Lecture 2 is explicit. Students already witnessed the three principles at work inside TCP: Jacobson separated cwnd from the receiver window (disaggregation), AIMD is a feedback loop of send → observe → adjust → send (closed-loop reasoning), and the FSM is distributed because no central entity controls the Internet (decision placement). Today the same three patterns appear in a completely different system — routing — built by different pioneers, solving different problems, under different constraints.


The constraint that shaped everything: survivability

Before any routing algorithm, before any Interface Message Processor, the constraint came first.

“Let us consider the synthesis of a communication network which will allow several hundred major communications stations to talk with one another after an enemy attack.” — Baran, 1964 [1]

Baran proved that a distributed mesh with just three to four links per node survives heavy attack, while a centralized hub — a star topology — does not. Destroy the center, destroy the network. This is the binding constraint that shapes everything that follows: the network must function without any central point of control [1].

The choice of topology resolves through a three-way tension among survivability, cost, and routing complexity:

TopologySurvivabilityCost (links for N nodes)Routing complexity
Star1 failure kills everything (hub)N−1 links (cheap)Trivial — hub forwards everything
Ring1 link break → network splitsN links (cheap)Simple — forward clockwise or counter
Full meshSurvives up to N−2 failuresN(N−1)/2 links (100 nodes = 4,950)Trivial — direct link to every destination
Irregular meshSurvives most single failures~2–3 links per nodeHard — must compute paths through intermediaries

Star is cheapest but violates Baran’s constraint — one failure kills everything. Full mesh is safest but cost is quadratic: 4,950 links for 100 nodes at $1,000/month each is $5M/month in 1969 dollars. Irregular mesh is the Goldilocks zone: sparse enough to afford, redundant enough to survive — but routing becomes the hard problem [1].


The ARPANET: four IMPs, sparse mesh, and the routing problem

The first ARPANET consisted of four IMPs in an irregular mesh with roughly two links per node. Each IMP was a Honeywell DDP-516 minicomputer with 12 KB of RAM. In 1969, protocol layers did not exist (OSI arrived in 1980). The IMP was router, modem, and host interface all in one box — the machine that moves packets [2].

“Economic considerations mitigate against a fully connected configuration.” — Heart et al., 1970 [2]

UCLA, SRI, and UCSB formed a triangle — redundant paths where any one link could fail without disconnection. Utah had a single link to SRI, meaning one failure would disconnect it entirely. The average was roughly two links per node [2].

Because the topology is sparse, packets must traverse intermediate nodes. This creates the routing problem: how does UCLA’s IMP know which wire to use for a packet destined for Utah? It needs a forwarding table. The question is how the table gets built [2].


The monolithic design and the timescale mismatch

Consider a naive design: every time a packet arrives, the IMP recomputes the route from scratch — queries neighbors, runs the algorithm, then forwards. The problem emerges immediately from the numbers. A packet arrives roughly every 0.3 ms (at 50 Kbps with 200-byte packets). Route computation takes tens of milliseconds — exchanging tables, running Bellman-Ford. While computing, the IMP stops forwarding. Packets queue and drop.

Two tasks with fundamentally different timescales are coupled in one machine:

TaskSpeedFrequency
Forward a packetMust complete in microsecondsEvery packet (~0.3 ms)
Recompute routesTakes millisecondsOnly when topology changes

The fix is structural: separate forwarding (fast, per-packet) from route computation (slow, periodic). The forwarding table becomes a cache of routing decisions [2].


Heart’s solution and the first design principle: disaggregation

Heart separated the IMP’s work into two independent processes [2]:

ProcessWhat it doesWhen it runs
ForwardingTable lookup → pick output portEvery packet (microseconds)
Route computationExchange tables → Bellman-FordEvery 128 ms (background)

The forwarding table is the interface between them. Forwarding reads it; route computation writes it. Neither waits for the other [2].

This pattern — separating concerns that operate at different timescales so each can evolve independently — is disaggregation. It creates an interface (the table). Change the routing algorithm and forwarding is unaffected. Speed up forwarding hardware and routing is unaffected. The cost is staleness: a packet forwarded using an outdated table goes to the wrong place.

The staleness cost recurs everywhere. DNS caching is disaggregated belief — a stale TTL is the same price. DHCP leases are disaggregated allocation — an expired lease is the same price. Wherever concerns are separated by an interface, the interface can become stale.


Building the forwarding table: Bellman-Ford and distance-vector routing

Baran’s constraint rules out a central route computer — it would be a single point of failure. Each IMP must build its own table. The mechanism is the Bellman-Ford equation [4]:

\[d(v) = \min_{u \in \text{neighbors}(v)} \left[ c(v,u) + d_u(v) \right]\]

“My cost to destination X equals the minimum over all neighbors of: link cost to that neighbor plus the neighbor’s advertised cost to X.”

Heart’s implementation ran Bellman-Ford live, every 128 ms [2][4]. Each IMP sent its full distance table to every neighbor. Each neighbor recomputed. After enough rounds, every IMP converged to the shortest path.


Convergence in action: watching the tables stabilize

Round 0 — each IMP knows only its direct neighbors:

┌──────────┬─────────┬─────────┬─────────┐
│          │ to UCLA │ to UCSB │ to Utah │
├──────────┼─────────┼─────────┼─────────┤
│ SRI      │ 1       │ 1       │ 1       │
│ UCLA     │ 0       │ 1       │ ∞       │
│ UCSB     │ 1       │ 0       │ ∞       │
│ Utah     │ ∞       │ ∞       │ 0       │
└──────────┴─────────┴─────────┴─────────┘

Round 1 — each IMP receives neighbors’ tables and recomputes:

┌──────────┬─────────┬─────────┬─────────┐
│          │ to UCLA │ to UCSB │ to Utah │
├──────────┼─────────┼─────────┼─────────┤
│ SRI      │ 1       │ 1       │ 1       │
│ UCLA     │ 0       │ 1       │ 2 (via SRI) │
│ UCSB     │ 1       │ 0       │ 2 (via SRI) │
│ Utah     │ 2 (via SRI) │ 2 (via SRI) │ 0  │
└──────────┴─────────┴─────────┴─────────┘

Converged in one round. On a 4-node network this is elegant. On a 60-node network in 1978, it broke catastrophically [2][3]. The critical observation: each IMP shares its computed distances — its belief about cost to every destination. When that belief is wrong, the system breaks.


The measurement signal: queue length as a proxy for cost

The Bellman-Ford equation uses a “cost” — but cost of what? Each IMP needs a single number to represent the quality of each link. Heart’s team chose output queue length: the number of packets waiting in the buffer [2]. Long queue means congested link, high cost, route around it. It was the cheapest signal available in 12 KB of RAM — just count the packets in the buffer.

The choice seems reasonable but fails under both extremes. Under light load, all queues are empty and all costs are approximately zero — the IMP has no basis to prefer any path. Under heavy load, the “best” path fills, its cost rises, traffic shifts to an alternate, the original drains, its cost drops, traffic shifts back — oscillation [2].

The diagnosis maps cleanly to the Estimate-Measure-Believe framework from Lecture 2:

  • Environment — true link quality
  • Measurement — queue snapshot (honest but misleading)
  • Belief — distance vector (built from misleading proxies)

TCP’s measurement problem from Lecture 2 was delay: ACKs arrive late, so the congestion estimate is stale. DV routing’s measurement problem is a bad proxy: queue length does not equal delay. Both produce belief that diverges from environment [2].

This is the second design principle: closed-loop reasoning. Every adaptive system is a feedback loop — measure, compute, act, measure again. DV routing loops every 128 ms. TCP loops every RTT. The loop is only as good as its measurement signal.


The count-to-infinity failure: when belief echoes back as measurement

Consider the SRI–Utah link failing. SRI detects the failure and sets its cost to Utah to infinity — correct. But UCLA and UCSB still hold old tables: “Utah: cost 2, via SRI” [2][3].

What happens next: UCLA asks UCSB “how far to Utah?” UCSB’s table says cost 2. UCLA concludes: cost to Utah via UCSB is 3. Next round, UCSB sees UCLA’s new cost of 3, concludes its cost via UCLA is 4. Costs climb forever. Packets bounce UCLA → UCSB → UCLA → UCSB indefinitely [2][3].

The trace unfolds round by round:

Round 1: SRI tells neighbors "Utah = ∞"
  UCLA: SRI says ∞. But UCSB still says 2. → UCLA records: Utah via UCSB, cost 3.
  UCSB: SRI says ∞. But UCLA still says 2. → UCSB records: Utah via UCLA, cost 3.

Round 2: UCLA and UCSB exchange their NEW tables
  UCLA: UCSB now says 3. → UCLA records: Utah via UCSB, cost 4.
  UCSB: UCLA now says 3. → UCSB records: Utah via UCLA, cost 4.

Round 3: costs climb... 5, 6, 7, 8... → ∞

The critical question is whether UCSB’s advertised “cost 2” goes through a working link. It does not — it routes through SRI–Utah, which is down. But UCLA has no way to know this, because distance-vector routing shares distances, not paths [2][3].

In 1969, there was no TTL — no mechanism to kill looping packets. IMPs used “tenacious forwarding”: hold the packet until the next hop acknowledges. Looping packets consumed IMP memory and link bandwidth until routing tables eventually converged — or did not. Heart’s team set a ceiling: if cost exceeds the number of nodes, declare unreachable. The TTL field was introduced with IP in 1981 (RFC 791) specifically to bound packet lifetime in a best-effort network [2].


Root cause: sharing belief instead of measurement

The root cause of count-to-infinity is that distance-vector routing shares computed distances (belief), not raw observations (measurement). UCSB’s advertisement “cost 2 to Utah” is UCSB’s belief. That belief was built from SRI’s old advertisement, which routed through the now-broken link. UCLA has no way to know this because DV hides the path [2][3].

In the Estimate-Measure-Believe framework, the layer that broke is measurement. The neighbor’s advertisement IS the measurement signal — and it carries stale belief disguised as fresh data. Your own outdated information echoes back through your neighbor [2][3].

Three fixes exist: share the path, not just the distance — this becomes path-vector routing and eventually BGP [7]; share the actual link states — McQuillan’s fix [3]; refuse to accept routes through yourself — split horizon, a partial fix.


McQuillan’s fix: share measurement, not belief

McQuillan’s 1980 redesign solved count-to-infinity by changing what each IMP advertises [3]. Instead of sharing computed distances, each IMP shares only what it directly measures about its own links:

SRI floods:  "My links: UCLA=up, UCSB=up, Utah=DOWN"
UCLA floods: "My links: SRI=up, UCSB=up"
UCSB floods: "My links: SRI=up, UCLA=up"
Utah floods: "My links: SRI=DOWN"

Every IMP now has the full topology graph. UCLA runs Dijkstra, sees Utah is unreachable, sets cost to infinity. One round. Correct answer. No echo [3].

The echo disappears because UCSB’s advertisement is now “my link to SRI is up” — a direct measurement of its own wire — rather than “my cost to Utah is 2,” a processed belief hiding which path it uses. Measurement comes from the environment, not from another node’s belief [3].

This is disaggregation applied again: separate what you measure (link probes) from what you compute (shortest paths). The same principle Heart used for forwarding versus routing is now applied to fix the measurement corruption that caused count-to-infinity [3].

The historical lineage maps directly to the systems taught in CS 176A:

YearSystemAlgorithm176A name
1969ARPANET DV (Heart)Bellman-Ford, live, 128 msThe origin
1980ARPANET LS (McQuillan)Flood link states + DijkstraThe fix
1988RIP (RFC 1058)Same Bellman-Ford as 1969, packaged for IPHeart’s DV, standardized. Still has count-to-infinity [5].
1989OSPF (RFC 1131 → RFC 2328)Same link-state as 1980, packaged for IPMcQuillan’s LS, standardized. Replaced RIP [6].

Decision placement: why distributed?

Both distance-vector and link-state routing are fully distributed — every IMP computes independently. The reason is Baran’s constraint [1]. The network must survive attack. A central route computer is a single point of failure.

But distributed placement has costs:

 CentralizedDistributed
Failure toleranceSingle point of failureSurvives individual node loss
ConvergenceInstant (one computation)Multiple rounds (DV) or flooding delay (LS)
ConsistencyAlways consistentTemporarily inconsistent during convergence
ScaleBounded by one machineEach node carries its own load

The constraint forced the placement. Survivability demanded distribution, even at the cost of convergence time and temporary inconsistency [1][2].

This is the third design principle: decision placement. The principle does not prescribe which placement to choose. It establishes that the binding constraint determines the answer. Baran’s survivability demands distribution. Licensed spectrum grants centralized scheduling (the WiFi and cellular story, coming later in the course). A single-admin datacenter enables centralized control (SDN, coming in Lecture 4).


Three principles, two systems, same patterns

PrincipleRouting (today)TCP (Lecture 2)The cost
DisaggregationForwarding/routing (Heart) · Measurement/belief (McQuillan)cwnd/receiver window (Jacobson)Interface can degrade (stale table, stale cwnd)
Closed-loopDV 128 ms: queue proxy → oscillation, stale belief → count-to-∞AIMD: ACK delay → bufferbloatLoop is only as good as its measurement
Decision placementDistributed (survivability — Baran)Distributed (admin decentralization)Convergence time, temporary inconsistency

Different systems. Different constraints. Same three patterns. That is what makes them principles — not TCP-specific tricks, not routing-specific tricks, but recurring structural responses to the problem of building adaptive distributed systems.


Looking forward: when cooperation breaks

The ARPANET was cooperative. Every router shared topology honestly — no secrets, no filtering, no policy. This worked because the ARPANET was a single trust domain: one organization, one administrative authority [6]. By 1989, the Internet had become an interconnection of thousands of commercially sovereign autonomous systems — AT&T, Sprint, MCI, universities, corporations — who refused to share anything. The trust assumption that made OSPF’s full-topology sharing possible no longer held across organizational boundaries.

Which invariant breaks first? What do you design when competitors refuse to share topology? Lecture 4 takes OSPF as a baseline and applies two different constraint shifts — the disappearance of trust (producing BGP) and the explosion of scale and cost (producing SDN/OpenFlow) — to show that the framework does not merely diagnose existing systems but predicts new architectures from constraint changes alone [7].


References

[1] P. Baran, “On Distributed Communications Networks,” IEEE Trans. Communications Systems, vol. CS-12, no. 1, pp. 1–9, March 1964.

[2] F. E. Heart, R. E. Kahn, S. M. Ornstein, W. R. Crowther, and D. C. Walden, “The Interface Message Processor for the ARPA Computer Network,” Proc. AFIPS Spring Joint Computer Conference, pp. 551–567, 1970.

[3] J. M. McQuillan, I. Richer, and E. C. Rosen, “The New Routing Algorithm for the ARPANET,” IEEE Trans. Communications, vol. COM-28, no. 5, pp. 711–719, May 1980.

[4] R. Bellman, “On a Routing Problem,” Quarterly of Applied Mathematics, vol. 16, no. 1, pp. 87–90, 1958.

[5] C. Hedrick, “Routing Information Protocol,” RFC 1058, June 1988.

[6] J. Moy, “OSPF Version 2,” RFC 2328, April 1998.

[7] Y. Rekhter, T. Li, and S. Hares, “A Border Gateway Protocol 4 (BGP-4),” RFC 4271, January 2006.