Design BGP and SDN From Scratch

The Framework as a Design Tool

Arpit Gupta

2026-04-09

Promise

You have the complete framework: four invariants, three principles, E-M-B decomposition.

Today you use it to design two real systems from scratch.

Exercise 1: OSPF works inside one organization. The Internet commercializes. OSPF breaks across organizations. Design what replaces it. → BGP

Exercise 2: OSPF works but doesn’t scale economically. Networks grow massive. Per-device cost explodes. Design what replaces the architecture. → SDN/OpenFlow

Two evolutions from the same baseline. Same framework generates both.

Exercise 1: Design BGP

OSPF: the baseline through first principles

Let’s visualize OSPF’s dependency graph. Help me fill this in — recall from Tuesday.

Binding constraint: survivability — the network must work despite failures (Baran, 1964)

Invariant OSPF’s answer Forced by Design principle
Coordination Distributed — each router computes independently Survivability → no central point of failure Decision placement
State Full topology — every router knows every link and cost Distributed → need shared truth to avoid loops Disaggregation: measurement from belief
Time Event-driven flooding, sub-second convergence Full topology → changes trigger immediate reflooding Closed-loop: fast, honest feedback
Interface LSAs — the format for exchanging raw link measurements Cooperative trust → share everything, hide nothing Honest measurement signal

What assumption holds this together?

The environment evolves: commercialization

1989: ARPANET decommissioned. The Internet becomes an interconnection of independent networks.

This is fundamentally different from ARPANET:

ARPANET Commercial Internet
Who operates it One cooperative research community Thousands of organizations — AT&T, Sprint, universities, corporations
Relationship Collaborative — shared goals Commercial — competing business interests
The problem Route within one backbone Organizations with disparate interests must figure out how to exchange traffic

The tool at hand is OSPF. Is it the right tool?

Look at OSPF’s four invariant answers. Which one becomes impossible when separate commercial organizations must route between each other?

HOW does OSPF’s State answer break?

Recall OSPF’s State answer: full topology shared — every router sees every link via LSAs.

This means every router floods: "Link A-B, cost 10" — raw topology, nothing hidden.

Would AT&T flood its internal topology to Sprint? Its 500 routers, link capacities, traffic engineering policies?

No. Commercial competitors treat topology as a trade secret. OSPF’s State answer — share everything honestly — requires trust that no longer exists.

OSPF can still work inside each organization — single admin, full trust. But it fails across organizations. This means routing has to split: one system inside, a different system between.

What does this separation create? Who handles the boundary?

The boundary creates gateway routers — and a new protocol

The State failure forces a split: intra-domain (OSPF, full trust) vs inter-domain (new protocol, filtered trust).

This creates gateway routers — routers at the AS boundary that speak both protocols. Inside: OSPF. Outside: the new inter-domain protocol.

Now: what does the gateway router advertise to other ASes?

OSPF advertises link states: "Link A-B, cost 10" — router-level. But we’re hiding internal routers. The outside world doesn’t know AT&T’s router names. What’s the right unit?

Prefixes — groups of IP addresses the AS is responsible for: 208.65.152.0/22

Plus the AS-level path for loop detection: [AS7018, AS3356, AS15169]

This is a path vector — more than a distance (DV), less than a topology (OSPF). The maximum competitors will disclose.

The E-M-B gap — structurally filtered

OSPF (cooperative) Inter-domain (path vector)
What’s shared Every link, every cost Prefixes + AS-level path
What’s hidden Nothing Internal topology, capacity, congestion, cost
E-M-B gap None — measurement = environment Permanent — by design

Is this the same type of gap as bufferbloat (accidentally noisy)? As count-to-infinity (circular belief)? Or something fundamentally different?

Time and Selection

Time ← the loop runs across thousands of ASes exchanging prefix+path updates. How fast? DV ran at 128ms and oscillated. OSPF floods in seconds inside one admin.

Slow — 30-second minimum between updates for the same prefix. Stability over speed. Cost: 3-15 min convergence after failures. (Closed-loop: learned from DV’s mistake)

Selection ← OSPF picks shortest path. But commercial ASes have business preferences — a paying customer’s longer path beats a competitor’s shorter path.

Business preference (LOCAL_PREF) overrides shortest path. The protocol enforces business relationships, not optimal routing. (Decision placement: each AS applies local policy)

You just designed BGP

Invariant Your design BGP (RFC 1105, 1989) Design principle
Coordination Each AS decides by local business policy LOCAL_PREF > AS_PATH length > tie-breakers Decision placement: maximally distributed
State Path vector — prefix + AS path (structurally filtered) AS_PATH + policy attributes; permanent E-M-B gap Closed-loop: design measurement signal given privacy
Time Slow updates for stability 30-second minimum between updates; 3-15 min convergence Closed-loop: stability over speed (DV’s lesson)
Interface Disaggregated from internal routing BGP ↔︎ OSPF/IS-IS boundary Disaggregation: inter-domain from intra-domain

OSPF → BGP: State broke because trust changed. Every other answer adapted.

BGP’s deepest lesson

Griffin (2002): BGP stability with arbitrary policies is NP-complete.

Gao-Rexford (2001): BGP converges if policies follow the customer-provider hierarchy.

Stability comes from economic structure, not protocol design. The E-M-B gap is permanent. The system works because institutional constraints — the market hierarchy — keep policies aligned.

Consequence of no verification: Pakistan Telecom announces YouTube’s prefix (2008). BGP accepts it — no origin authentication. YouTube goes dark globally for 2 hours. RPKI (2012) partially fixes origin validation — same deployability meta-constraint.

A tension is forming. BGP selects one best path per prefix — stability demands simplicity. Yet operators want finer control: route video one way, bulk transfers another. Every additional policy rule (communities, route maps, prefix-list filters) adds complexity per device — and complexity threatens the stability Gao-Rexford guarantees. Precision competes with stability. Control competes with cost. This same tension will reappear inside organizations — and it will break the architecture.

:::

Exercise 2: Design SDN

OSPF inside an organization: what didn’t scale

Go back to OSPF — inside a single organization (datacenter, campus, WAN). Same dependency graph as slide 4. But a different problem emerged.

Three forces converged:

1. Scale exploded. Cloud providers built datacenters with tens of thousands of switches. Each device running link-state + Dijkstra + line-rate forwarding = compute and memory bloat. Routers cost $500K+. Cisco/Juniper monopoly pricing — organizations trapped.

2. Precision demanded bloat. OSPF routes by shortest path to IP prefix. Organizations wanted: video on path A, conferencing on path B, bulk transfers on path C. Per-application rules require more FIB/TCAM entries per device → more memory → higher cost. Every additional rule inflates every device.

3. Policy required touching every device. Each router owns its own control plane. Changing routing policy = reconfigure every router individually. Casado (2007): human error accounted for 62% of network downtime — because network-wide policy was expressed through thousands of lines of local configuration on individual devices.

Which invariant is under pressure?

Three forces: scale bloats per-device cost. Precision demands more rules per device. Policy requires touching every device individually.

All three trace back to one architectural choice: every router computes its own control plane. Route computation, policy expression, traffic engineering — distributed across every device.

Which invariant is that? And if it’s the problem — what’s the alternative?

But wait — in Lecture 3, Baran proved distributed coordination was essential for survivability. Doesn’t centralizing routing create a single point of failure?

What changed since Baran?

“Network management is complex and requires strong consistency, making it quite hard to compute in a distributed manner.” — Casado et al., 2007

Baran’s context (1964): a national network across hostile territory. Survive nuclear attack. Centralization = one bomb destroys routing.

SDN’s context (2004): a datacenter or campus you fully control. You own every device, every link, every power supply.

Inside your own domain: you can replicate controllers, add fast failover, monitor health. Logically centralized ≠ physically centralized.

And the data showed: distributed configuration caused more downtime (62% from human error) than centralization risked. The real threat was no longer physical destruction — it was operational complexity.

What would you pull out of the router?

“Why should network-wide routing decisions be implemented through thousands of lines of local configuration on individual, distributed devices?” — Feamster et al., 2004

The control plane and data plane are coupled in every device. Route computation, policy, traffic engineering — all crammed into the same $500K box. Same monolithic-IMP pattern from Lecture 3.

Heart separated forwarding from routing in 1969 — same box, different processes. What’s the equivalent separation here?

Design SDN: trace the chain

Binding constraint: per-device cost + operational complexity unsustainable. Manageability, not survivability.

Coordination ← single admin, full authority → centralized controller. The survivability tradeoff is acceptable because you control the domain and can replicate for redundancy.

State ← controller sees full topology (no secrets — you own everything). Switches hold only forwarding rules pushed to them. No Dijkstra, no link-state database on the switch.

Time ← controller pushes rules directly → sub-second. No distributed convergence. No path exploration.

Interface ← match-action rules: match on any header field (not just IP prefix). Video traffic? Match on port 443 + specific server IPs. This is the flexibility OSPF lacked — without inflating every device.

You just designed SDN / OpenFlow

“OpenFlow provides an open protocol to program the forwarding table in different switches.” — McKeown et al., 2008

Invariant OSPF (coupled) Your design SDN/OpenFlow (2008)
Coordination Distributed — each router decides Centralized controller NOX, ONOS, ODL
State Full topology in every router Global in controller; switches hold only rules Network Information Base
Time Distributed convergence (seconds) Controller pushes rules (ms) Flow setup: milliseconds
Interface Per-prefix forwarding (limited) Match on any header field (flexible) OpenFlow match-action tables

Heart (1969): separated forwarding from routing — same box, different processes.

SDN (2008): separated control plane from data plane — different devices entirely.

Same disaggregation principle. Applied more aggressively because the cost constraint demanded it.

Summary

OSPF → BGP, OSPF → SDN: two evolutions, one baseline

OSPF (baseline) BGP SDN
Context Intra-domain, cooperative Inter-domain, commercial Intra-domain, scale + flexibility
What broke (baseline) State — topology becomes a trade secret Control/data coupling → cost bloat + inflexibility
Binding constraint Survivability Commercial sovereignty Per-device cost + policy precision
Key principle Closed-loop (LS flooding) Closed-loop (slow updates, measurement under privacy) Disaggregation (control from data plane)
Coordination Distributed Distributed (sovereign) Centralized
State Full topology Filtered paths (permanent gap) Global (controller)

BGP: trust changes → State answer changes, coordination stays distributed.

SDN: cost + flexibility changes → deeper disaggregation, coordination centralizes.

Next week: wireless medium access — binding constraint is physics (shared spectrum). Same framework, new substrate. Read Ch 3.