Lecture 4: Design BGP and SDN From Scratch — The Framework as a Design Tool

Course: CS176C — Advanced Topics in Internet Computing, Spring 2026
Instructor: Arpit Gupta, UC Santa Barbara
Date: April 9, 2026
Slides: Deployed slide deck
Pre-requisite: L3 (Design Principles in Action)

The framework becomes a design tool

By the end of Lecture 3, the toolkit was complete: four invariants (State, Time, Coordination, Interface), three design principles (decision placement, disaggregation, closed-loop reasoning), and the Estimate-Measure-Believe decomposition that classifies every gap between what a protocol knows and what the environment actually is. Until now, the framework has been analytic — applied after the fact to systems that already exist. Today it becomes generative. Starting from a single baseline system — OSPF — two different environmental changes produce two entirely different architectures: BGP and SDN. The framework predicts both from constraint shifts alone.

The structure is two design exercises from the same starting point. In the first, trust disappears and OSPF’s State answer breaks across commercial boundaries; the result is BGP. In the second, scale and cost explode and OSPF’s Coordination answer becomes economically unsustainable; the result is SDN/OpenFlow. Both exercises demonstrate that when the binding constraint shifts, the framework generates the correct architectural response — not by guessing, but by tracing which invariant answer can no longer hold.

OSPF as baseline: the dependency graph

OSPF’s binding constraint is survivability — the network must continue operating despite failures [1]. From that single constraint, the four invariant answers follow in a dependency chain:

Invariant	OSPF’s answer	Forced by	Design principle
Coordination	Distributed — each router computes independently	Survivability → no central point of failure	Decision placement
State	Full topology — every router knows every link and cost	Distributed → need shared truth to avoid loops	Disaggregation: measurement from belief [3]
Time	Event-driven flooding, sub-second convergence	Full topology → changes trigger immediate reflooding	Closed-loop: fast, honest feedback
Interface	LSAs — the format for exchanging raw link measurements	Cooperative trust → share everything, hide nothing	Honest measurement signal

The assumption holding this together is cooperation. Every router shares everything honestly. No secrets, no filtering, no policy. This works because OSPF operates within a single trust domain — one organization, one administrative authority [6]. The question is what happens when that assumption breaks.

Exercise 1: commercialization breaks the State invariant

In 1989, ARPANET was decommissioned and the Internet became an interconnection of independent commercial networks. The environment shifted from a single cooperative research community to thousands of organizations — AT&T, Sprint, universities, corporations — with competing business interests [7]. The tool at hand was still OSPF. The question is whether it remains the right tool.

	ARPANET	Commercial Internet
Who operates it	One cooperative research community	Thousands of organizations with disparate interests
Relationship	Collaborative — shared goals	Commercial — competing business interests
The problem	Route within one backbone	Organizations must exchange traffic despite commercial competition

The invariant that breaks is State. OSPF’s State answer requires sharing full topology honestly — every link, every cost, via Link-State Advertisements. AT&T will never share its 500-router internal topology with Sprint. Internal topology is a trade secret. The cooperative trust that made full-topology sharing possible no longer exists across organizational boundaries [7].

The consequence is a forced disaggregation. OSPF continues to work inside each organization — single administrative domain, full trust. But it fails across organizations. Routing must split: one system inside (intra-domain), a different system between (inter-domain). This separation creates gateway routers — routers at the AS boundary that speak both protocols. Inside: OSPF. Outside: the new inter-domain protocol that must be designed [7][10].

What the gateway router can advertise

The State failure constrains what information crosses organizational boundaries. OSPF advertises link states — raw topology like “Link A-B, cost 10” at router-level granularity. But across trust boundaries, internal routers are hidden. The outside world cannot know AT&T’s router names, link capacities, or traffic engineering policies.

The right unit of advertisement is prefixes — groups of IP addresses the AS is responsible for (e.g., 208.65.152.0/22) — plus the AS-level path for loop detection (e.g., [AS7018, AS3356, AS15169]). This is a path vector: more information than a distance vector (which hides the path entirely and enables loops), less information than full topology (which no competitor will disclose). It represents the maximum that commercial entities will share [7].

What distance-vector shares	What path vector shares	What OSPF shares
Distance only	Prefix + AS-level path	Every link, every cost
Hides path → loops	Shows AS path → loop detection	Hides nothing
Too little	Just enough	Too much (across trust boundaries)

The E-M-B gap becomes structural

The inter-domain protocol has a fundamentally different E-M-B gap than any system encountered previously. In OSPF, measurement equals environment — every router sees every link honestly, and the gap is zero. In the inter-domain case, the gap is permanent and by design. Commercial organizations deliberately hide internal topology, capacity, congestion, and cost. The protocol must be designed to work despite this permanent information deficit [7][8].

Gap type	System	Cause	Fixable?
Accidentally noisy	TCP bufferbloat	Measurement honest but delayed	Yes — better estimators
Circular belief	DV count-to-infinity	Stale belief echoes back	Yes — share raw measurement [3]
Structurally filtered	Inter-domain routing	Sender deliberately hides information	No — filtering is intentional

This is a new category: the structurally filtered gap. The system cannot be fixed by improving measurement fidelity or breaking circular dependencies. The information is absent because the source refuses to provide it. The protocol must make correct decisions with permanently incomplete knowledge [7][8].

Time and selection under structural filtering

The Time answer for the inter-domain protocol reflects a lesson learned from distance-vector’s instability. DV ran at 128 ms update intervals and oscillated [2]. OSPF floods within seconds inside one administrative domain [6]. The inter-domain protocol operates across thousands of ASes exchanging prefix-plus-path updates. Speed is dangerous at this scale — a 30-second minimum between updates for the same prefix prioritizes stability over speed. The cost is 3-15 minutes of convergence after failures. This is closed-loop reasoning applied to the update frequency itself: the DV experience proved that fast, unrestricted updates across multiple administrative domains produce oscillation, not convergence [7].

Selection presents a different challenge. OSPF picks shortest path — an objective metric optimized cooperatively. But commercial ASes have business preferences that override objective metrics. A paying customer’s longer path beats a competitor’s shorter path. LOCAL_PREF overrides shortest-path selection. The protocol enforces business relationships, not optimal routing. This is decision placement at work: each AS applies local policy autonomously because no global authority exists to impose a unified objective [7][8].

The result is BGP

The design that emerges from tracing the framework through the trust failure matches BGP as specified in RFC 4271 [7]:

Invariant	Design from framework	BGP (RFC 1105, 1989) [7]	Design principle
Coordination	Each AS decides by local business policy	LOCAL_PREF > AS_PATH length > tie-breakers	Decision placement: maximally distributed
State	Path vector — prefix + AS path (structurally filtered)	AS_PATH + policy attributes; permanent E-M-B gap	Closed-loop: design measurement signal given privacy
Time	Slow updates for stability	30-second minimum; 3–15 min convergence	Closed-loop: stability over speed (DV’s lesson)
Interface	Disaggregated from internal routing	BGP ↔ OSPF/IS-IS boundary	Disaggregation: inter-domain from intra-domain

The summary: OSPF to BGP is a State failure. Trust changed, the State invariant answer could no longer hold, and every other answer adapted to function under the structurally filtered gap that resulted.

BGP’s deeper lessons

Three consequences follow from the structurally filtered gap.

First, stability is institutional, not algorithmic. Griffin et al. (2002) proved that BGP convergence with arbitrary policies is NP-complete — there exist policy combinations for which no stable routing solution exists [8]. Gao and Rexford (2001) showed that BGP converges if and only if policies follow the customer-provider hierarchy — the economic relationships that structure the commercial Internet [9]. The E-M-B gap is permanent, and the system works not because the algorithm guarantees convergence but because economic incentives prevent pathological policy combinations.

Second, trust and verification remain unsolved. In 2008, Pakistan Telecom announced YouTube’s prefix. BGP accepted it — no origin authentication existed. YouTube went dark globally for two hours. BGP inherited cooperative trust assumptions from its ARPANET heritage without verification mechanisms. RPKI (2012) partially fixes origin validation, but deployment remains incomplete — the same deployability meta-constraint that slows every Internet-wide upgrade.

Third, a tension emerges that bridges to the second exercise. BGP selects one best path per prefix — stability demands simplicity. But operators want finer control: route video one way, bulk transfers another. Every additional policy rule (communities, route maps, prefix-list filters) adds complexity per device. Precision competes with stability. Control competes with cost. This same tension exists inside organizations — and it will break the architecture.

Exercise 2: scale and cost break the Coordination invariant

Return to OSPF operating inside a single organization — datacenter, campus, WAN. The trust assumption still holds (single administrative domain), so State is fine. But three forces converged to make OSPF’s architecture economically unsustainable:

Scale exploded. Cloud providers built datacenters with tens of thousands of switches. Each device running link-state computation plus Dijkstra plus line-rate forwarding required expensive compute and memory. Routers cost \$500K or more. Cisco and Juniper monopoly pricing trapped organizations in a hardware cost spiral.

Precision demanded bloat. OSPF routes by shortest path to IP prefix. Organizations wanted per-application routing: video on path A, conferencing on path B, bulk transfers on path C. Per-application rules require more FIB/TCAM entries per device — more memory, higher cost. Every additional rule inflates every device in the network.

Policy required touching every device. Each router owns its own control plane. Changing routing policy means reconfiguring every router individually. Casado et al. (2007) documented that human error accounted for 62% of network downtime — because network-wide policy was expressed through thousands of lines of local configuration on individual devices [10].

Identifying the invariant under pressure

All three forces trace back to one architectural choice: every router computes its own control plane. Route computation, policy expression, traffic engineering — all distributed across every device. The invariant under pressure is Coordination. If distributed coordination is the problem, the alternative is to centralize path computation.

But Baran proved in 1964 that distributed coordination was essential for survivability [1]. Centralizing routing creates a single point of failure. This is exactly the tension the SDN pioneers faced. The resolution lies in recognizing what changed between 1964 and 2004:

	Baran’s context (1964)	SDN’s context (2004)
Domain	National network across hostile territory	Datacenter/campus you fully control
Threat	Nuclear attack — centralization = one bomb destroys routing	Operational complexity — distributed config = 62% downtime
Central point	Physically vulnerable, unreplicable	Replicable — 3 controllers in different racks, millisecond failover

Inside your own domain, you can replicate controllers, add fast failover, and monitor health continuously. Logically centralized does not mean physically centralized. The data showed that distributed configuration caused more downtime (62% from human error [10]) than centralization risked. The binding constraint shifted from survivability against an external adversary to manageability of internal complexity.

Disaggregation: separating control from data

The control plane and data plane are coupled in every router — the same monolithic pattern that Heart identified in the original IMP design [2]. Heart separated forwarding from routing in 1969 — same box, different processes. SDN applies the same disaggregation principle more aggressively: pull the control plane out of every router entirely [11][12].

Switches become simple match-action engines. A separate controller computes routes, policies, and traffic engineering centrally. Heart separated processes; SDN separates devices. The disaggregation is deeper because the cost constraint demands it. Feamster et al. (2004) asked the question directly: “Why should network-wide routing decisions be implemented through thousands of lines of local configuration on individual, distributed devices?” [12]. The answer, increasingly, was that they should not.

The SDN design through the framework

The binding constraint is per-device cost plus operational complexity — manageability, not survivability. From this constraint, the four invariant answers follow:

Invariant	Design	Rationale
Coordination	Centralized controller	Single admin, full authority. Survivability tradeoff acceptable — replicate for redundancy.
State	Global in controller; switches hold only forwarding rules	No secrets — you own everything. No Dijkstra on the switch.
Time	Sub-second — controller pushes rules directly	No distributed convergence. No path exploration.
Interface	Match-action rules on any header field (not just IP prefix)	Video traffic? Match on port 443 + specific server IPs. The flexibility OSPF lacked — without inflating every device [11].

The result is SDN/OpenFlow

The design matches OpenFlow as described by McKeown et al. (2008): “OpenFlow provides an open protocol to program the forwarding table in different switches” [11].

Invariant	OSPF (coupled)	Design from framework	SDN/OpenFlow (2008)
Coordination	Distributed — each router decides	Centralized controller	NOX, ONOS, ODL
State	Full topology in every router	Global in controller; switches hold only rules	Network Information Base
Time	Distributed convergence (seconds)	Controller pushes rules (ms)	Flow setup: milliseconds
Interface	Per-prefix forwarding (limited)	Match on any header field (flexible)	OpenFlow match-action tables

The intellectual lineage traces through three papers, each addressing a different invariant:

Feamster et al. (2004) — “The Case for Separating Routing from Routers” — focused on State: centralize the routing database [12].
Casado et al. (2007) — Ethane — focused on Coordination: centralize policy enforcement [10].
McKeown et al. (2008) — OpenFlow — focused on Interface: standardize the controller-switch abstraction [11].

All three were motivated by the same constraint shift: from survivability to manageability.

Two evolutions from one baseline

The two exercises reveal how a single framework generates radically different architectures from different constraint shifts:

	OSPF (baseline)	BGP	SDN
Context	Intra-domain, cooperative	Inter-domain, commercial	Intra-domain, scale + flexibility
What broke	(baseline)	State — topology becomes a trade secret	Control/data coupling → cost bloat + inflexibility
Binding constraint	Survivability	Commercial sovereignty	Per-device cost + policy precision
Key principle	Closed-loop (LS flooding)	Closed-loop (slow updates, measurement under privacy)	Disaggregation (control from data plane)
Coordination	Distributed	Distributed (sovereign)	Centralized
State	Full topology	Filtered paths (permanent gap)	Global (controller)

BGP: trust changes, the State answer changes, coordination stays distributed. SDN: cost and flexibility change, deeper disaggregation occurs, coordination centralizes. Same framework diagnosed both, generated both, from constraint shifts alone.

Forward: from logical networks to physical media

Both BGP and SDN operate on wired networks where communication is point-to-point — a packet sent on a fiber or copper link reaches exactly one destination. The shared-medium problem does not arise because the medium is not shared. Starting in Lecture 5, the domain shifts to wireless, where transmission is inherently broadcast and the medium is shared by all devices within range. Physics denies the most basic form of feedback: a wireless transmitter cannot hear what is happening to its own transmission. The four-invariant framework carries forward unchanged, but the binding constraint becomes physical — shared spectrum, not commercial trust or economic cost. The question becomes: how do you coordinate access to a medium that everyone hears but no one fully observes?

References

[1] P. Baran, “On Distributed Communications Networks,” IEEE Trans. Communications Systems, vol. CS-12, no. 1, pp. 1–9, March 1964.

[2] F. E. Heart, R. E. Kahn, S. M. Ornstein, W. R. Crowther, and D. C. Walden, “The Interface Message Processor for the ARPA Computer Network,” Proc. AFIPS Spring Joint Computer Conference, pp. 551–567, 1970.

[3] J. M. McQuillan, I. Richer, and E. C. Rosen, “The New Routing Algorithm for the ARPANET,” IEEE Trans. Communications, vol. COM-28, no. 5, pp. 711–719, May 1980.

[4] R. Bellman, “On a Routing Problem,” Quarterly of Applied Mathematics, vol. 16, no. 1, pp. 87–90, 1958.

[5] C. Hedrick, “Routing Information Protocol,” RFC 1058, June 1988.

[6] J. Moy, “OSPF Version 2,” RFC 2328, April 1998.

[7] Y. Rekhter, T. Li, and S. Hares, “A Border Gateway Protocol 4 (BGP-4),” RFC 4271, January 2006.

[8] T. G. Griffin, F. B. Shepherd, and G. Wilfong, “The Stable Paths Problem and Interdomain Routing,” IEEE/ACM Trans. Networking, vol. 10, no. 2, pp. 232–243, April 2002.

[9] L. Gao and J. Rexford, “Stable Internet Routing Without Global Coordination,” IEEE/ACM Trans. Networking, vol. 9, no. 6, pp. 681–692, December 2001.

[10] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker, “Ethane: Taking Control of the Enterprise,” Proc. ACM SIGCOMM, pp. 1–12, 2007.

[11] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “OpenFlow: Enabling Innovation in Campus Networks,” ACM SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 69–74, April 2008.

[12] N. Feamster, H. Balakrishnan, J. Rexford, A. Shaikh, and J. van der Merwe, “The Case for Separating Routing from Routers,” Proc. ACM SIGCOMM Workshop on Future Directions in Network Architecture (FDNA), 2004.