Design BGP and SDN From Scratch

The Framework as a Design Tool

Arpit Gupta

2026-04-09

Promise

You have the complete framework: four invariants, three principles, E-M-B decomposition.

Today you use it to design two real systems from scratch.

Exercise 1: OSPF works inside one organization. The Internet commercializes. OSPF breaks across organizations. Design what replaces it. → BGP

Exercise 2: OSPF works but doesn’t scale economically. Networks grow massive. Per-device cost explodes. Design what replaces the architecture. → SDN/OpenFlow

Two evolutions from the same baseline. Same framework generates both.

Exercise 1: Design BGP

OSPF: the baseline through first principles

Let’s visualize OSPF’s dependency graph. Help me fill this in — recall from Tuesday.

Binding constraint: survivability — the network must work despite failures (Baran, 1964)

Invariant	OSPF’s answer	Forced by	Design principle
Coordination	Distributed — each router computes independently	Survivability → no central point of failure	Decision placement

State

Full topology — every router knows every link and cost

Distributed → need shared truth to avoid loops

Disaggregation: measurement from belief

Time

Event-driven flooding, sub-second convergence

Full topology → changes trigger immediate reflooding

Closed-loop: fast, honest feedback

Interface

LSAs — the format for exchanging raw link measurements

Cooperative trust → share everything, hide nothing

Honest measurement signal

What assumption holds this together?

The environment evolves: commercialization

1989: ARPANET decommissioned. The Internet becomes an interconnection of independent networks.

This is fundamentally different from ARPANET:

	ARPANET	Commercial Internet
Who operates it	One cooperative research community	Thousands of organizations — AT&T, Sprint, universities, corporations
Relationship	Collaborative — shared goals	Commercial — competing business interests
The problem	Route within one backbone	Organizations with disparate interests must figure out how to exchange traffic

The tool at hand is OSPF. Is it the right tool?

Look at OSPF’s four invariant answers. Which one becomes impossible when separate commercial organizations must route between each other?

HOW does OSPF’s State answer break?

Recall OSPF’s State answer: full topology shared — every router sees every link via LSAs.

This means every router floods: "Link A-B, cost 10" — raw topology, nothing hidden.

Would AT&T flood its internal topology to Sprint? Its 500 routers, link capacities, traffic engineering policies?

No. Commercial competitors treat topology as a trade secret. OSPF’s State answer — share everything honestly — requires trust that no longer exists.

OSPF can still work inside each organization — single admin, full trust. But it fails across organizations. This means routing has to split: one system inside, a different system between.

What does this separation create? Who handles the boundary?

The boundary creates gateway routers — and a new protocol

The State failure forces a split: intra-domain (OSPF, full trust) vs inter-domain (new protocol, filtered trust).

This creates gateway routers — routers at the AS boundary that speak both protocols. Inside: OSPF. Outside: the new inter-domain protocol.

Now: what does the gateway router advertise to other ASes?

OSPF advertises link states: "Link A-B, cost 10" — router-level. But we’re hiding internal routers. The outside world doesn’t know AT&T’s router names. What’s the right unit?

→ Prefixes — groups of IP addresses the AS is responsible for: 208.65.152.0/22

Plus the AS-level path for loop detection: [AS7018, AS3356, AS15169]

This is a path vector — more than a distance (DV), less than a topology (OSPF). The maximum competitors will disclose.

The E-M-B gap — structurally filtered

	OSPF (cooperative)	Inter-domain (path vector)
What’s shared	Every link, every cost	Prefixes + AS-level path
What’s hidden	Nothing	Internal topology, capacity, congestion, cost
E-M-B gap	None — measurement = environment	Permanent — by design

Is this the same type of gap as bufferbloat (accidentally noisy)? As count-to-infinity (circular belief)? Or something fundamentally different?

Time and Selection

Time ← the loop runs across thousands of ASes exchanging prefix+path updates. How fast? DV ran at 128ms and oscillated. OSPF floods in seconds inside one admin.

→ Slow — 30-second minimum between updates for the same prefix. Stability over speed. Cost: 3-15 min convergence after failures. (Closed-loop: learned from DV’s mistake)

Selection ← OSPF picks shortest path. But commercial ASes have business preferences — a paying customer’s longer path beats a competitor’s shorter path.

→ Business preference (LOCAL_PREF) overrides shortest path. The protocol enforces business relationships, not optimal routing. (Decision placement: each AS applies local policy)

You just designed BGP

Invariant	Your design	BGP (RFC 1105, 1989)	Design principle
Coordination	Each AS decides by local business policy	LOCAL_PREF > AS_PATH length > tie-breakers	Decision placement: maximally distributed
State	Path vector — prefix + AS path (structurally filtered)	AS_PATH + policy attributes; permanent E-M-B gap	Closed-loop: design measurement signal given privacy
Time	Slow updates for stability	30-second minimum between updates; 3-15 min convergence	Closed-loop: stability over speed (DV’s lesson)
Interface	Disaggregated from internal routing	BGP ↔︎ OSPF/IS-IS boundary	Disaggregation: inter-domain from intra-domain

OSPF → BGP: State broke because trust changed. Every other answer adapted.

BGP’s deepest lesson

Griffin (2002): BGP stability with arbitrary policies is NP-complete.

Gao-Rexford (2001): BGP converges if policies follow the customer-provider hierarchy.

Stability comes from economic structure, not protocol design. The E-M-B gap is permanent. The system works because institutional constraints — the market hierarchy — keep policies aligned.

Consequence of no verification: Pakistan Telecom announces YouTube’s prefix (2008). BGP accepts it — no origin authentication. YouTube goes dark globally for 2 hours. RPKI (2012) partially fixes origin validation — same deployability meta-constraint.

A tension is forming. BGP selects one best path per prefix — stability demands simplicity. Yet operators want finer control: route video one way, bulk transfers another. Every additional policy rule (communities, route maps, prefix-list filters) adds complexity per device — and complexity threatens the stability Gao-Rexford guarantees. Precision competes with stability. Control competes with cost. This same tension will reappear inside organizations — and it will break the architecture.

KEY INSIGHT + FORESHADOW (minute 27): Three points in sequence. (1) Stability: “BGP’s stability is institutional — remove the economic hierarchy and it oscillates forever (Griffin). Add the hierarchy and convergence is guaranteed (Gao-Rexford).” (2) Trust: “Pakistan/YouTube — BGP inherited cooperative trust, no verification. RPKI took 14 years.” (3) The tension — this is the bridge to SDN: “BGP picks one best path per prefix. What if AT&T wants video on path A, bulk on path B? BGP can’t express that natively. Operators bolt on communities, route maps, filters. Each hack adds state that every router must process. The protocol bloats. More control = more complexity per device = higher cost. This tension — precision vs. stability, control vs. cost — exists inter-domain with BGP. But the SAME tension exists INSIDE organizations with OSPF. And inside organizations, they can do something about it. That’s Exercise 2.”

:::

Exercise 2: Design SDN

OSPF inside an organization: what didn’t scale

Go back to OSPF — inside a single organization (datacenter, campus, WAN). Same dependency graph as slide 4. But a different problem emerged.

Three forces converged:

1. Scale exploded. Cloud providers built datacenters with tens of thousands of switches. Each device running link-state + Dijkstra + line-rate forwarding = compute and memory bloat. Routers cost $500K+. Cisco/Juniper monopoly pricing — organizations trapped.

2. Precision demanded bloat. OSPF routes by shortest path to IP prefix. Organizations wanted: video on path A, conferencing on path B, bulk transfers on path C. Per-application rules require more FIB/TCAM entries per device → more memory → higher cost. Every additional rule inflates every device.

3. Policy required touching every device. Each router owns its own control plane. Changing routing policy = reconfigure every router individually. Casado (2007): human error accounted for 62% of network downtime — because network-wide policy was expressed through thousands of lines of local configuration on individual devices.

Which invariant is under pressure?

Three forces: scale bloats per-device cost. Precision demands more rules per device. Policy requires touching every device individually.

All three trace back to one architectural choice: every router computes its own control plane. Route computation, policy expression, traffic engineering — distributed across every device.

Which invariant is that? And if it’s the problem — what’s the alternative?

But wait — in Lecture 3, Baran proved distributed coordination was essential for survivability. Doesn’t centralizing routing create a single point of failure?

What changed since Baran?

“Network management is complex and requires strong consistency, making it quite hard to compute in a distributed manner.” — Casado et al., 2007

Baran’s context (1964): a national network across hostile territory. Survive nuclear attack. Centralization = one bomb destroys routing.

SDN’s context (2004): a datacenter or campus you fully control. You own every device, every link, every power supply.

Inside your own domain: you can replicate controllers, add fast failover, monitor health. Logically centralized ≠ physically centralized.

And the data showed: distributed configuration caused more downtime (62% from human error) than centralization risked. The real threat was no longer physical destruction — it was operational complexity.

What would you pull out of the router?

“Why should network-wide routing decisions be implemented through thousands of lines of local configuration on individual, distributed devices?” — Feamster et al., 2004

The control plane and data plane are coupled in every device. Route computation, policy, traffic engineering — all crammed into the same $500K box. Same monolithic-IMP pattern from Lecture 3.

Heart separated forwarding from routing in 1969 — same box, different processes. What’s the equivalent separation here?

Design SDN: trace the chain

Binding constraint: per-device cost + operational complexity unsustainable. Manageability, not survivability.

Coordination ← single admin, full authority → centralized controller. The survivability tradeoff is acceptable because you control the domain and can replicate for redundancy.

State ← controller sees full topology (no secrets — you own everything). Switches hold only forwarding rules pushed to them. No Dijkstra, no link-state database on the switch.

Time ← controller pushes rules directly → sub-second. No distributed convergence. No path exploration.

Interface ← match-action rules: match on any header field (not just IP prefix). Video traffic? Match on port 443 + specific server IPs. This is the flexibility OSPF lacked — without inflating every device.

You just designed SDN / OpenFlow

“OpenFlow provides an open protocol to program the forwarding table in different switches.” — McKeown et al., 2008

Invariant	OSPF (coupled)	Your design	SDN/OpenFlow (2008)
Coordination	Distributed — each router decides	Centralized controller	NOX, ONOS, ODL
State	Full topology in every router	Global in controller; switches hold only rules	Network Information Base
Time	Distributed convergence (seconds)	Controller pushes rules (ms)	Flow setup: milliseconds
Interface	Per-prefix forwarding (limited)	Match on any header field (flexible)	OpenFlow match-action tables

Heart (1969): separated forwarding from routing — same box, different processes.

SDN (2008): separated control plane from data plane — different devices entirely.

Same disaggregation principle. Applied more aggressively because the cost constraint demanded it.

Summary

OSPF → BGP, OSPF → SDN: two evolutions, one baseline

	OSPF (baseline)	BGP	SDN
Context	Intra-domain, cooperative	Inter-domain, commercial	Intra-domain, scale + flexibility
What broke	(baseline)	State — topology becomes a trade secret	Control/data coupling → cost bloat + inflexibility
Binding constraint	Survivability	Commercial sovereignty	Per-device cost + policy precision
Key principle	Closed-loop (LS flooding)	Closed-loop (slow updates, measurement under privacy)	Disaggregation (control from data plane)
Coordination	Distributed	Distributed (sovereign)	Centralized
State	Full topology	Filtered paths (permanent gap)	Global (controller)

BGP: trust changes → State answer changes, coordination stays distributed.

SDN: cost + flexibility changes → deeper disaggregation, coordination centralizes.

Next week: wireless medium access — binding constraint is physics (shared spectrum). Same framework, new substrate. Read Ch 3.