4 Wireless Architecture — Disaggregation at Infrastructure Scale – A First-Principles Approach to Networked Systems

4.1 The Monolith and Its Escape

In 2G cellular networks, the base station was a monolith: radio transmission, channel coding, mobility control, and billing all coupled in one box connected to one mobile switching center (MSC). The MSC itself was monolithic—voice switching, call routing, subscriber lookup, charging, all in the same appliance. This architecture answered the four invariants with solutions tightly bound to hardware: state was tied to specific cells, time was prescribed (circuit switching: fixed bandwidth, long holding times), coordination was centralized (one MSC per region), interfaces were proprietary (vendor lock-in).

The architecture worked for voice—a stable, symmetric, long-holding service. But when demand shifted toward bursty packet data, the monolith cracked. Voice calls assume fixed bandwidth reservations; packet data is asymmetric, variable-rate, bursty. Forcing VoIP into a circuit-switched model was like fitting water into a rigid mold—inefficient and structurally wrong. The escape was disaggregation: separate radio access from mobility from core switching from billing. By 5G, disaggregation applies at every architectural layer. The central narrative is simple: disaggregation is how you transform a monolithic system constrained by physics and history into a composable platform.

Wireless architecture sits at the boundary between two decomposition axes: the functional split (control plane vs. data plane) and the temporal split (fast PHY decisions vs. slow management decisions). The 5G RAN disaggregation (CU/DU/RU) is an explicit re-decomposition driven by changing latency and cost constraints.

The anchor constraint that drives disaggregation is convergence on a unified transport. Once all traffic—voice, video, data—is carried as IP packets (not fixed-bandwidth circuits), you can separate concerns: who manages the radio? (RAN). Who routes packets? (core network). Who enforces policy? (separate from routing). Who anchors the user’s IP address? (separate from policy). Each separation is an opportunity for independent scaling, independent optimization, independent deployment. This is disaggregation as a principle: partition coupled concerns so each can be answered independently, constrained only by its interface.

4.2 Cellular Architecture Evolution: From Monolith to Microservices

4.2.1 The Three Architectures

2G (GSM, 1991): Monolithic base station (BTS) + monolithic switching center (MSC). The BTS knows radio (channel assignment, power control, handoff). The MSC knows subscribers (HLR lookup), call routing, billing. Tight coupling: a subscriber’s registration is in the MSC’s HLR; the BTS cannot function independently.

3G (UMTS, early 2000s): Bifurcation emerges. Voice still uses the circuit MSC. But packet data bypasses it: a new path SGSN (Serving GPRS Support Node) → GGSN (Gateway GPRS Support Node) handles IP packets. The base station (Node-B) now branches: voice packets go to MSC, data packets go to SGSN. State bifurcates: circuit state (MSC path) and packet state (SGSN/GGSN path) coexist but do not interact. The architecture admits packet data by adding parallel infrastructure, not replacing the circuit core. This is incremental disaggregation under backward-compatibility constraints.

4G (LTE, 2010s): Unification. All traffic is IP—voice becomes VoIP, data is native IP packets. The circuit MSC vanishes. The core network unifies around IP: eNodeB (evolved base station) connects to MME (Mobility Management Entity), S-GW (Serving Gateway), P-GW (Packet Gateway), HSS (Home Subscriber Server). Each function specializes: MME handles attachment and handoff; S-GW anchors the user’s IP address; P-GW is the gateway to the Internet; HSS stores subscriptions. State is distributed across functions, but traffic is unified.

5G (2020s): Atomization. Core functions disaggregate into stateless microservices—AMF (Access/Mobility Management), SMF (Session Management), UPF (User Plane Function), UDM (User Data Management), PCF (Policy Control), NRF (Network Repository Function). Each is independent software running on commodity cloud infrastructure. Functions are invoked via REST APIs, not proprietary protocols. State lives in databases, not in memory. This is the endpoint of disaggregation: functions are no longer coupled to hardware; they are pure software on commodity infrastructure.

4.2.2 The Invariant Answers Shift at Each Generation

State: Progressively distributed and decoupled from hardware. - 2G: state bound to specific MSC (home network) and BTS (current serving cell). - 3G: radio state (Node-B) separate from packet state (SGSN/GGSN). - 4G: radio state (eNB), access state (MME), session state (S-GW), subscription state (HSS)—distributed across functions but still bound to specific hardware appliances. - 5G: state is function-independent. An AMF instance holds some registration state; another AMF instance holds state for different users. State is stored in databases (UDM, policy stores). If an AMF crashes, another instance resumes with state reloaded.

Time: Feedback loops accelerate. - 2G: attachment seconds, handoff rare (minutes). Billing computed offline, hours post-call. - 3G: attachment sub-second, handoff more frequent (seconds). Policy changes within minutes. - 4G: attachment <1s, handoff frequent (every few seconds). Policy changes in seconds. - 5G: attachment <1s, handoff <100ms, policy changes real-time. Distribution enables parallelism—policy changes to one function do not block others.

Coordination: Centralization to distributed choreography. - 2G: home MSC decides everything. - 3G: HSS centralizes subscription; all nodes query it. - 4G: MME coordinates attachment/handoff; other nodes follow instructions. - 5G: decentralized by design. AMF makes attachment decisions independently. SMF makes session decisions independently. PCF sets policy; UPF enforces independently. Functions use a service registry (NRF) to discover each other and make localized decisions without central coordination.

Interface: Proprietary to cloud-native. - 2G: proprietary circuit protocols (A-bis, MAP). - 3G: added packet protocols (Gn, Gi); MAP persists. - 4G: all data is IP; control plane uses Diameter, GTP signaling. - 5G: service-based interfaces (HTTP/REST APIs). Functions call each other’s APIs (N10, N11, etc.). This is not a networking innovation—it is adoption of web architecture in telecom.

4.3 CU/DU/RU Disaggregation: The Radio Functional Split

While the core network disaggregates into microservices, the radio access network (RAN) disaggregates along the time axis. The base station—which contains radio transmission (RF), signal processing (Layer 1: modulation, coding), and control logic (Layer 2-3: scheduling, mobility)—splits into three units based on latency sensitivity.

4.3.1 The Split Points

While the core network disaggregates into microservices, the RAN disaggregates along the time axis: radio transmission, signal processing, and control logic split into three units based on latency sensitivity. The split is not arbitrary—it is constrained by the physical requirement that RF transmission must stay near the antenna, while control can migrate centrally if transport delays permit. This creates a spectrum of options: aggressive centralization (Option 7) requires premium fronthaul; conservative splits (Option 6) work with microwave backhaul. The different split options are compared in Figure 4.1, showing the tradeoff between centralization benefit and fronthaul cost.

RAN Functional Split: Central, Distributed, and Radio Units — Figure 4.1: This split structure enables two critical network properties: first, it decouples the placement of compute from the physical constraint of RF proximity, allowing operators to centralize control for statistical multiplexing and better resource sharing; second, it creates a spectrum of deployment options from aggressive centralization (requiring premium low-latency fronthaul) to conservative splits (working with commodity microwave backhaul). The choice of split point—where to place each unit—becomes an engineering constraint: aggressive centralization improves efficiency but demands lower-latency fronthaul; conservative splits tolerate worse fronthaul but sacrifice some multiplexing gains.

RU (Radio Unit): At the antenna. Handles RF transmission, digital-to-analog conversion, the lowest-latency operations. Cannot be centralized—physics requires it near the antenna. Must operate at 1-10 microsecond latencies.

DU (Distributed Unit): Layer 1-2 processing—modulation, channel coding, scheduling decisions, MAC (medium access control). Can be centralized to an edge data center (100-200 km from cells) if fronthaul latency is <5 ms. Answers: which users get which resource blocks (PRBs) this TTI? At what modulation-coding scheme?

CU (Central Unit): Layer 3 control—RRC (radio resource control), mobility management, session setup. Tolerates latencies up to ~100 ms, so can be further centralized (regional data center, hundreds of km away).

4.3.2 The Anchor: Fronthaul Latency

The split is constrained by transport latency—the time for signals to travel between units. A 1 ms LTE frame means Layer 1 processing must happen in <~5 ms round-trip. I/Q samples (raw radio signals) are high-bandwidth (100-500 Mbps per cell). If the DU is too far, fronthaul latency exceeds processing deadlines, and the system cannot meet frame deadlines—calls drop, capacity collapses.

The result is not one split, but a menu of options (3GPP splits 1, 2, 6, 7, 8):

Option 7 (aggressive): I/Q samples flow over fronthaul to DU. Fronthaul: 100-500 Mbps, latency <5 ms. Requires premium fiber. Benefit: maximum centralization (one regional DU pool serves many cells). Cost: expensive fronthaul.
Option 2 (moderate): Layer 1 (modulation) stays at RU. Layer 2-3 go to DU. Fronthaul: 10-50 Mbps, latency <10 ms. Benefit: moderate centralization. Cost: lower fronthaul burden.
Option 6 (conservative): DU and RU co-located (RAN stays on-site). CU is centralized. Latency <100 ms. Benefit: minimal backhaul upgrade needed. Cost: no radio processing centralization.

The choice is economic and operational. Dense urban: fiber is available, Option 7 is affordable. Rural: microwave backhaul, Option 6 is pragmatic. The anchor is not physics alone—it is physics + cost + deployability constraints.

The three-unit architecture answers the state-time-coordination invariants distinctly. State is distributed: RU maintains RF hardware; DU maintains channel state and scheduling history; CU maintains mobility state. Time horizons differ: RU operates in microseconds, DU in milliseconds (TTI = 1 ms), CU in seconds. Coordination flows vertically: CU makes session decisions (is this user attached?), DU makes per-TTI scheduling, RU executes transmission. The functional split enables independent operation while maintaining vertical coordination through well-defined interfaces (fronthaul for RU↔︎DU, X2/Xn for DU↔︎CU).

4.4 Base Station Scheduling: Centralized Coordination at the Air Interface

While the RAN disaggregates functionally (CU/DU/RU), it remains centralized in decision-making. Every millisecond (LTE TTI = 1 ms), the scheduler makes the most critical allocation decision: which users get which radio resources?

4.4.1 The Problem

A base station has finite spectrum: ~50-100 physical resource blocks (PRBs). At any TTI, hundreds or thousands of users compete for these PRBs. Each user’s channel quality differs (a nearby user with clear line-of-sight can use 64-QAM modulation—6 bits per symbol; a distant user with obstruction uses QPSK—2 bits per symbol). If you allocate all PRBs to the user with the best channel, you maximize instantaneous throughput but starve the poor-channel user. If you allocate equally, you achieve fairness but waste spectral efficiency (wasting the good-channel user’s potential). The scheduler must answer: given channel quality and queue state, which allocation maximizes throughput while ensuring fairness? Figure 4.2 shows the two-dimensional resource grid (frequency × time) that the scheduler partitions across users, with color intensity reflecting per-user channel quality.

V4 — OFDMA Resource Grid — Figure 4.2: In contrast to WiFi’s distributed contention where each station independently listens and backs off, OFDMA centralizes allocation at the base station. Every 1 millisecond (one transmission time interval or TTI), the scheduler observes channel quality reports (CQI) from all connected devices, computes the allocation that maximizes throughput subject to fairness constraints, and broadcasts a trigger frame specifying which PRBs each user can access. This centralized control enables utilization >70%—far exceeding the ~30% ceiling of CSMA/CA—because the scheduler prevents collisions through coordination rather than hoping devices will avoid them. The cost is infrastructure dependency (an access point is mandatory, ad hoc operation is not possible) and measurement overhead (users must report CQI every 5–40 milliseconds), but the throughput gains justify the tradeoff in licensed spectrum deployments.

The scheduler works with resource elements (RE: 15 kHz × ~0.07 ms) aggregated into physical resource blocks (PRB: 12 subcarriers × 7 OFDM symbols = 84 REs). The scheduler makes one coding/modulation decision per PRB, not per RE—aggregating decisions reduces complexity but still enables fine-grained allocation across users.

4.4.2 The Measurement Signal and Closed Loop

Every 5-40 ms, each user sends a Channel Quality Indicator (CQI) report—a 1-byte measurement of signal-to-noise ratio on each frequency band. The base station receives ~1000 CQI reports per cell per second. From this measurement signal, the scheduler infers what modulation-coding scheme (MCS) each user can reliably support. Reporting granularity balances measurement accuracy (report per PRB for precision) against uplink overhead (cost in spectrum). Most systems report summaries: CQI per frequency group (~10 groups instead of 100 PRBs).

The feedback loop is tight:

Users report CQI → (5-40 ms latency)
Scheduler allocates PRBs and selects MCS for each user → (1 ms decision)
Users transmit on allocated PRBs with allocated MCS → (1 ms transmission)
Base station observes success/failure (ACK/NACK) → (1 ms feedback)
Retransmit failed packets on next opportunity → (loop closes)

This is dramatically tighter than 802.11, where the feedback loop (ACK timeout) is measured in hundreds of milliseconds. The tight feedback loop enables rapid adaptation: if the channel degrades, the scheduler can select a lower MCS immediately (next TTI). If the channel improves, it can attempt higher modulation (higher risk, higher reward if successful).

4.4.3 Invariant Answers

State: Each user has queue state (backlog waiting transmission), channel state (CQI estimate), and scheduling history (was this user allocated last TTI, did it succeed?). The scheduler maintains a coarse model: (user, channel quality estimate, queue depth). Tracking per-PRB CQI for every user would consume too much state and reporting bandwidth—instead, the scheduler works with summaries.

Time: Allocation decisions happen every TTI (1 ms in LTE). CQI feedback is periodic (5-40 ms intervals). Retransmission is reactive (NACK triggers immediate re-queuing, but transmission waits for next available opportunity). The tight 1 ms cycle for allocation is the binding constraint—the scheduler must compute thousands of allocation decisions per millisecond, so algorithms are greedy heuristics, not exhaustive optimization.

Coordination: Fully centralized. The base station scheduler is the single decider. Users have no negotiating power—they report CQI (feedback), but the scheduler makes unilateral allocation decisions. This is the opposite of WiFi’s distributed CSMA/CA.

Interface: Downlink control information (DCI) tells each user which PRBs are allocated and which MCS to use. Uplink CQI reports inform the scheduler. The interface is tightly defined—3GPP specifies CQI format and feedback intervals—but NOT the scheduling algorithm. Each vendor’s base station uses proprietary scheduling logic. All must achieve fairness (no user starvation), but the heuristic (proportional fair, max-throughput, latency-based priority) is a competitive differentiator.

4.4.4 Principles at Work

Disaggregation: Channel measurement (CQI reports) is separated from allocation decisions. A user measures locally and reports asynchronously; the scheduler makes synchronous decisions every TTI. This separation allows users to measure independently without waiting for scheduler acknowledgment.

Closed-Loop Reasoning: UEs report CQI → scheduler allocates → UEs transmit → success/failure observed → scheduler adjusts next allocation based on outcomes. The loop is fast (1 ms) and tight. If CQI prediction is pessimistic, the scheduler urges higher modulation. If optimistic (high retransmission rate), urge caution. This forms a control loop, similar to TCP congestion control but at the radio layer and with much faster timescales.

Decision Placement: Centralization is anchored by resource scarcity and coordination complexity. With thousands of users competing for 100 PRBs, a distributed approach (each user negotiates for PRBs) would create collision and overhead. Centralization enables coordinated allocation and prevents collisions. The cost is that the base station must be computationally powerful enough to make thousands of allocation decisions per millisecond—a real architectural constraint.

4.5 Mobility and Handoff: Coordination Authority Transfer

A mobile device moves through space, changing coverage from one base station to another. Handoff is the coordinated process of transferring the user’s session from source to target without dropping the call or losing packets. The mechanism is elegant: it disaggregates handoff into phases (trigger, candidate selection, resource allocation, commit, execution), applies measurement hysteresis to avoid oscillation, and uses make-before-break to ensure continuity.

Mobility poses a fundamental control problem: users move, requiring coordination authority to transfer from one base station to another without dropping calls or losing packets. The challenge is measurement uncertainty—signal strength fluctuates due to fading and shadowing. A decision made with stale measurements (200-500 ms old) may be suboptimal by execution time. Hysteresis adds inertia: the target must be not just stronger but threshold-stronger (3 dB), trading slight suboptimality for stability and reducing ping-pong oscillation at cell boundaries. Figure 4.3 traces the full handoff sequence — from the measurement trigger through candidate evaluation, resource reservation, and the make-before-break execution that maintains session continuity.

Cellular Handoff: Trigger, Selection, Allocation, and Execution — Figure 4.3: The critical innovation is hysteresis: the target base station must be not just stronger, but threshold-stronger (typically 3 dB better) than the source to trigger a handoff. Without hysteresis, the system ping-pongs at cell boundaries where signal strengths are nearly equal. Transient measurement fluctuations (due to fading and shadowing) would cause oscillations: measure target stronger → handoff; measure source stronger → handoff back. Hysteresis adds inertia by requiring a minimum margin for change, preventing rapid oscillations. This design accepts slight suboptimality (staying slightly longer with a weaker cell) as the price for stability. Once the hysteresis condition is met, the process enters resource allocation: the target cell reserves resources (PRBs, buffer space, processing capacity), and the source confirms readiness. The make-before-break execution ensures the device establishes context with the target before severing the source connection, avoiding packet loss or dropped calls during the transition.

4.5.1 The Measurement-Decision-Execution Loop

Trigger: Every 200-500 ms, the user measures signal strength of nearby base stations and reports. The source base station (or MME) compares: is the target base station’s signal stronger than the source by more than a hysteresis threshold (e.g., 3 dB)?

Hysteresis: Without it, the system oscillates. At the boundary between two cells with equal signal strength, the user’s measurements fluctuate (fading, shadowing). Measure source stronger → stay; measure target stronger → handoff; measure source stronger → handoff back. Ping-pong. Hysteresis adds inertia: the target must be not just stronger but threshold stronger (3 dB). This trades slight suboptimality (staying slightly longer with weaker cell) for stability (avoiding oscillation). A control design trade-off: prefer stability over optimality when measurements are uncertain.

Candidate Selection: The source base station or MME identifies which cells could receive this user. Candidates are ranked by recent signal strength and load.

Resource Allocation: The target cell reserves resources (PRBs, buffer space, processing capacity) for the incoming user. The source cell confirms readiness to source.

Execution: Make-before-break. The user synchronizes to the target cell before disconnecting from the source. This creates a brief overlap where the user can receive data from both. Data forwarding: the source cell forwards in-flight packets to the target cell via X2 interface (direct eNB-to-eNB link). Total time: 50-150 ms. If execution takes too long, the user loses synchronization to the source before successfully connecting to target—call drops.

4.5.2 Invariant Answers and Constraints

State: The user has a care-of address (IP address, typically anchored at S-GW in 4G or UPF in 5G), a current serving base station, and registration state (which network knows where to route packets). During handoff, this state must be updated without loss: radio state (eNB) changes; gateway anchor (S-GW/UPF) may or may not change depending on whether the new cell is in the same gateway area.

Time: Handoff happens every few seconds when moving (highway: every 1-2 seconds; walking in dense urban: every few seconds; stationary: never). The decision to handoff is made with delayed, uncertain information—measurements are 200-500 ms old, signal fluctuates due to fading. The system must decide with imperfect information: handoff too early and the target might degrade unexpectedly; too late and the source signal becomes unusable.

Coordination: In 4G, the MME is the handoff decision authority. Source and target eNBs coordinate via X2 interface. The core network (if the S-GW anchor must change) becomes involved (slower, ~1 second). In 5G, decisions are more distributed: the gNB makes suggestions; the AMF (mobility management) makes decisions. The UPF may be reanchored if needed, but more UPF instances (compared to 4G’s single S-GW) enable faster reanchoring.

Interface: Measurement reports (from UE to eNB), handoff commands (from eNB to UE), X2 signaling (eNB-to-eNB coordination), core network updates (if gateway changes). The interfaces are tightly synchronized—the entire sequence must complete within ~150 ms, a hard deadline.

The handoff process decomposes into phases: trigger (measurement exceeds threshold), candidate selection (which cell receives the user?), resource allocation (reserve PRBs and buffers at target), and execution (connect to target before disconnecting from source). Make-before-break prevents call drop by maintaining dual connections briefly, allowing data forwarding from source to target. The entire sequence must complete in ~150 ms—a hard deadline because after this window, the user loses synchronization to both cells. This temporal constraint drives centralized decision-making: the MME (in 4G) or AMF (in 5G) orchestrates the handoff, accessing global view of candidate quality and availability.

4.5.3 Closed-Loop Dynamics and Failure Modes

Ping-pong oscillation: Without hysteresis, the decision loop oscillates. With hysteresis, it stabilizes but tolerates suboptimal decisions. This is a pattern repeated throughout networked systems: feedback control under uncertainty must trade off optimality for stability.

Late handoff: If handoff is too late, the user cannot hear the handoff command (source signal too weak) or cannot synchronize to target (channel too poor). Call drops. The safety margin is narrow.

Data loss: During handoff, data forwarding fails if timing is mismatched: packets sent to source after source has released resources but before target has received and buffered them. Modern systems use retransmission and buffering to minimize this, but the window is tight (milliseconds).

Anchor reachoring delay: In 4G, if the user moves to a different S-GW pool, the anchor (the IP address anchor) must move. This requires core network updates and can take 1-2 seconds. In 5G, distributed UPFs enable faster reanchoring (different UPF pool, faster switch). This is why disaggregation at the core level enables better mobility.

4.6 Network Slicing and Virtualization: Platforms that Constrain

Network slicing enables multiple independent logical networks to coexist on shared physical infrastructure. An operator allocates “slices” to different tenants (automotive, enterprise, video), each with own QoS guarantees, failover policies, security domains. A slice is a constrained version of the network: “your slice gets 50 PRBs, guaranteed 10 Mbps, <20 ms latency, 99.99% reliability.”

4.6.1 The Abstraction and Its Constraints

Slicing is a platform—a layer of abstraction that constrains what invariant answers are feasible. State constraints: each slice maintains independent resource pools (RBs, power budget, security tokens). State is distributed: the orchestrator tracks global allocation; each base station tracks local consumption and which slice each RB belongs to. Time constraints: slices operate on multiple timescales. Long-term (hours-days): SLA negotiation and resource provisioning. Medium-term (seconds): traffic-aware reallocation. Short-term (milliseconds): per-TTI scheduling within the slice’s resource budget. Coordination constraints: a centralized orchestrator makes long-term decisions (allocate these RBs to automotive slice, those to video); distributed schedulers make per-TTI decisions within boundaries. Interface constraints: tenants interact via SLA API (“guarantee 10 Mbps, <20 ms latency”), not radio details. The abstraction hides PRBs and scheduling—tenants see a virtual network.

4.6.2 The Tradeoff: Isolation vs. Efficiency

Slicing promises isolation (your SLA is guaranteed) and efficiency (pack multiple tenants on shared resources). These are in tension. If every tenant gets dedicated RBs, isolation is perfect but efficiency is low (RBs idle in low-demand slices). If RBs are dynamically shared, efficiency is high but isolation is weak (one tenant’s burst can starve another until orchestrator reallocates, which takes seconds). Real systems navigate this tradeoff: most slices get reservations (guaranteed minimums) + oversubscription (shared excess capacity, first-come-first-served if available). An automotive slice gets guaranteed low-latency path (reserved short buffer, immediate scheduling); a video slice gets bulk path (lower priority, variable latency). SLA design reflects this: “guaranteed minimum 5 Mbps; best-effort up to 20 Mbps with lower latency guarantee.”

4.6.3 Principles at Work

Disaggregation: Slicing disaggregates resources by tenant. Traditional network: one scheduler for all users. Sliced network: per-slice scheduler, or one global scheduler that respects slice boundaries. This disaggregation enables independent SLAs: each slice has its own QoS requirements, and failure of one slice does not affect others (resource isolation).

Closed-Loop Reasoning: The orchestrator monitors slice performance (are we meeting SLAs?) and adjusts allocations. If automotive slice’s latency budget is threatened (load too high), the orchestrator can reallocate video slice’s RBs to automotive (preemption). If video slice is consistently underutilized, the orchestrator reclaims its RBs for other slices (cost optimization). The measurement (SLA compliance, resource utilization) drives reallocation decisions.

Decision Placement: Centralization at the orchestrator level (who gets which RBs) combined with distribution at the scheduler level (how to use those RBs). This hybrid approach is practical: the orchestrator makes slower decisions (timescale: seconds), while schedulers make fast decisions (timescale: milliseconds) within constraints.

4.7 The 5G Core: Microservices and Service-Based Architecture

4.7.1 From Monolith to Microservices

The 4G core was monolithic: one HSS appliance stored all subscriptions, one MME handled all attachments, one S-GW anchored all sessions. The core was vertically integrated—a vendor sold a complete box; operators couldn’t pick and choose. Scaling one function meant buying bigger appliances; new features required vendor upgrades.

The 5G core disaggregates: functions become stateless microservices running on commodity cloud infrastructure (Kubernetes, AWS, Azure, on-premises clouds). Services: AMF (Access/Mobility Management), SMF (Session Management), UPF (User Plane Function), PCF (Policy Control), UDM (User Data Management), NRF (Network Repository Function). Each is independent: if session load spikes, add SMF instances; if user data throughput is bottleneck, add UPF instances. With monolithic 4G, you’d scale the entire core (wasting resources on lightly-loaded functions like subscription lookup).

4.7.2 Service-Based Interfaces and Choreography

Services communicate via REST APIs (Service Based Interface, SBI). When a user attaches:

UE → AMF (attach request)
AMF → NRF (discover SMF) → returns SMF address
AMF → SMF (create session)
SMF → NRF (discover policy function) → returns PCF address
SMF → PCF (get policy) → returns rate limits, charging info
SMF → UPF (establish data path) → configures packet forwarding
SMF → AMF (session created) → AMF confirms to UE

This is choreography, not orchestration. No central conductor decides the flow; services react to messages and call each other’s APIs. Each service owns a piece of state: AMF owns attachment registrations, SMF owns sessions, UDM owns subscriptions, PCF owns policy. Consistency is loose (eventual consistency)—if SMF crashes, another SMF reloads from persistent storage and resumes; brief session loss, but system recovers (RTO: 10-30 seconds).

4.7.3 Invariant Answers in Microservice Architecture

State: Distributed and decoupled from hardware. Each service holds minimal state needed for its function. AMF doesn’t care about sessions; SMF doesn’t care about attachments. State is stored in databases (UDM, policy stores), enabling stateless service instances. If an SMF instance crashes, another instance reads state from database and continues. This enables horizontal scaling: add SMF instances without losing state.

Time: Multiple timescales. Session events (setup, deletion): microseconds-to-seconds. Service discovery (NRF lookup): milliseconds. Policy updates: seconds-to-minutes. Each service operates asynchronously; messages are delivered with some latency (network hops add milliseconds). A user attach involves ~5-7 service calls; each hop adds latency. Complex attach sequence: 50-200 ms total (acceptable for session setup, would be problematic if per-packet).

Coordination: Distributed via choreography. NRF is the service registry but not a centralized decider. Each service makes local decisions (AMF decides where to attach user; SMF decides which UPF to use; PCF decides rate limits). No single orchestrator coordinates; services coordinate implicitly through message passing. Advantage: resilient (no single point of failure at the control level). Disadvantage: harder to guarantee global consistency (if messages are lost or delayed).

Interface: REST APIs (N10, N11, etc.). Each service exposes endpoints. Contrast with 4G’s proprietary protocols (Diameter, GTP)—REST is simpler, language-agnostic, and leverages web infrastructure. Debugging is easier (standard HTTP tools work); monitoring is standard (Prometheus/Grafana); security is standard (TLS).

4.7.4 Cloudification: Software, Not Hardware

Cloudification is the realization of disaggregation. Disaggregate functions (conceptual separation); virtualize them (run on commodity servers); commoditize infrastructure (Kubernetes, cloud APIs). The result: network is software running on hardware you don’t own (cloud) or on hardware you choose (commodity servers). This shifts economics from CAPEX (buy specialized appliances upfront) to OPEX (pay per compute hour used).

An operator wanting 5G network has two paths:

Path A (Traditional): Buy integrated RAN+core from vendor (Nokia, Ericsson). CapEx: $10M upfront for hardware, software licenses, integration. Constraint: locked into vendor’s roadmap and pricing.

Path B (Cloud-Native): Deploy open-source 5G (ONAP, OSM) on cloud (AWS, Azure, or on-premises). CapEx: $2M upfront for software engineering. OpEx: $0.5M/month cloud compute. Benefit: flexibility (swap vendors, update faster, pay only for used capacity). Cost: higher OpEx if load is constant (cloud compute is more expensive than dedicated hardware at scale).

4.7.5 The Anchor: Load Heterogeneity and Optimization

The anchor constraint that drives cloudification is heterogeneous load profiles. Session setup (SMF) load spikes with attachment storms; policy enforcement (PCF) load spikes with quota checks; user plane (UPF) load is continuous. A monolithic core is sized for peak across all functions (all peaks at once—worst case). A microservice core is sized per-function: if SMF is peak 100, PCF is peak 30, UPF is peak 500, you provision SMF for 100, PCF for 30, UPF for 500. Combined CapEx is lower (no over-provisioning of low-demand functions).

Additionally, cloudification enables faster innovation: push a new SMF version without restarting core; route traffic gradually to new version; rollback if issues arise. With monolithic cores, feature updates require planned downtime (every 6-12 months). With microservices, updates are continuous (daily/hourly).

4.8 Measurement Overhead and CQI Reporting: The Hidden Cost of Optimization

Base station scheduling’s tight feedback loop requires dense measurement. Each user sends CQI reports every 5-40 ms; a cell with 1000 users generates ~25,000 CQI reports per second. Each report is 1-5 bytes. This translates to 25-125 kbps of uplink overhead per cell—meaningful in a 1-5 Mbps uplink capacity.

Measurement signal design balances several costs:

Reporting frequency: More frequent = better scheduler accuracy but higher overhead. Solution: adaptive reporting. Measure every 5 ms when channel is fast-changing (highway); extend to 40 ms when stable (office building).
Reporting granularity: Report CQI per PRB (100 PRBs = 100 bits) vs. per frequency group (10 groups = 10 bits). Solution: summaries. Report per group by default; report per-PRB only when requested for certain users or frequencies.
Compressed feedback: Report only PRBs where CQI changed significantly. Solution: Delta encoding. Send only differences from previous report.
Predictive CQI: Estimate future channel quality from trend. Solution: Machine learning. Train model on historical CQI; predict next report; reduce reporting frequency.

The scheduler must then interpret CQI: if a user reports CQI=8 (64-QAM, 6 bits/symbol), should the scheduler trust it? CQI definitions vary—some report maximum sustainable rate (conservative: 10% block error rate); others report typical rate (aggressive: 50% block error rate). Conservative CQI leads to underutilization (scheduler is overly cautious); aggressive CQI leads to high retransmission rates. Most implementations use “CQI that guarantees ~10% block error rate”—a middle ground.

4.9 Generative Exercises

4.9.1 Exercise 1: Private 5G Factory Floor

A manufacturing company deploys a private 5G network on its factory floor. Unlike a public network (millions of users, massive scale), this network has ~1000 employees, controlled environment, and single administrative domain. The question: how do the invariant answers change when administrative boundaries collapse?

Analysis:

State: In a public network, state must be distributed (users roam, administrative borders are hard). In a private network, state can be centralized (all users belong to one company, all devices registered at one place). How might the core architecture change if you had one central UDM, one central AMF, one UPF at the factory center?
Time: In a public network, handoff must be fast (users move between operators’ networks, delays are visible). In a private network, handoff can be slower (all base stations are within company control, coordination is simpler). Could you reduce handoff latency if there were no roaming delays? What if handoff were coordinated across all cells centrally (one orchestrator)?
Coordination: In a public network, no entity controls all cells (multi-vendor, multi-operator). In a private network, a single company controls all infrastructure. What if the factory deployed one global scheduler for all cells (versus distributed schedulers at each cell)? What would be the benefits (optimal spectrum allocation across cells) and costs (central scheduler bottleneck)?
Interface: In a public network, standardized interfaces (X2 for eNB-eNB, N11 for SMF-UPF) enable interoperability. In a private network, you could use proprietary interfaces if you control all equipment. Would that accelerate performance or introduce unnecessary coupling?

Hypothesis: Collapsed administrative boundaries enable centralization (state, coordination, control). This trades off resilience (no fallback if central element fails) for optimization (global visibility enables better decisions). The design should reflect the risk tolerance: a factory can tolerate brief outages (production paused for repair); a public network cannot (millions of users affected).

4.9.2 Exercise 2: 5G RAN Disaggregation Economics

An operator has 100 cells in a dense city. Today, it uses monolithic base stations ($100k each) = $10M hardware cost. It is considering disaggregation: 100 RUs ($10k each) at cells, 2 centralized DU/CU data centers. What is the tradeoff?

CapEx Analysis: - RU hardware: 100 × $10k = $1M - Fronthaul deployment (fiber to each RU): $40k per RU × 100 = $4M - DU/CU data center equipment: $1M - Total: $6M (lower than $10M monolith)

But (implicit costs): - Fronthaul latency is tight constraint (5 ms max for Layer 1 split). Is every site reachable with <5 ms? If not, fallback to Layer 2 or Layer 3 split, reducing optimization benefit. - Operational complexity: orchestration, monitoring distributed DUs/CUs, debugging failures spread across geography.

Hypothesis: Disaggregation is cheaper at scale (10+ cells). At small scale (10 cells), the fronthaul cost dominates. The breakeven point depends on fiber availability and labor costs.

Design exercise: For a smaller operator with 20 cells (where disaggregation isn’t obviously cheaper), what other factors might justify it? (Vendor flexibility? Faster feature deployment? Energy efficiency through centralized cooling?)

4.10 Summary: Disaggregation as the Organizing Principle

The entire 5G architecture story is disaggregation applied at infrastructure scale. Starting from a monolithic 2G base station and MSC, each generation peeled away coupling:

3G: voice and data paths separate (first disaggregation)
4G: all-IP unification, distributed core functions (MME, S-GW, P-GW), but still hardware-bound
5G: complete atomization into stateless microservices + RAN split (CU/DU/RU) + cloud deployment

At each step, the anchor constraint shifted. 2G: circuit switching (rigidity). 3G: parallel voice+data (incremental change). 4G: unified IP transport (opens door to disaggregation). 5G: cloud commodity hardware (enables software-defined networks).

The principles drive design:

Disaggregation: Separating coupled concerns enables independent scaling, updating, deployment. CU/DU/RU split isolates radio processing from control. Core microservices isolate session management from policy from user data. Each separation creates a new interface; each interface is an opportunity for decoupling and optimization.

Closed-Loop Reasoning: Base station schedulers adapt MCS based on CQI feedback (1 ms loop). Mobility applies hysteresis to avoid ping-pong oscillation. 5G core uses measurement-driven policy updates (PCF adjusts limits based on traffic metrics). All loops operate at different timescales; all require stable feedback signals.

Decision Placement: From centralized (4G MME controls mobility, S-GW anchors sessions) to distributed (5G AMF, SMF, UPF make localized decisions via choreography). Distribution reduces single points of failure but complicates consistency. The sweet spot varies by constraint: admission control (centralized, requires global view) vs. packet forwarding (distributed, local decisions sufficient).

Network slicing is the platform that constrains what variant answers are feasible. It is not a new invariant; it is a constraint layer that partitions available resources and enforces SLA boundaries. Functions beneath the slice (scheduler, handoff) must respect the slice’s resource budget.

The central insight: architecture is how you answer the four invariants under constraints. As environmental constraints shift (from voice to data, from monolithic hardware to commodity servers, from single operator to shared infrastructure), the answers shift. Disaggregation is the strategy that keeps the system decomposable—each function can evolve independently, limited only by its interface.

4.11 References

3GPP (2018). “3GPP TR 38.912: Study on New Radio (NR) Access Technology Physical Layer Aspects.” 3GPP Technical Report 38.912.
3GPP (2022). “TS 23.501: System Architecture for 5G.” 3GPP Technical Specification 23.501.
Bria, A., Gessler, F., Queseth, O., Stridh, R., Unbehaun, M., Wu, J., and Reit, J. (2001). “4th Generation Wireless Infrastructures – Scenarios and Research Challenges.” Proc. 12th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).
Fettweis, G. P. and Alamouti, S. (2014). “5G: Personal Mobile Internet and the Internet of Things.” IEEE Wireless Communications Magazine, 21(2):64-75.
Hoang, H., Harada, H., Mori, K., and Sato, Y. (2009). “Aspects of Mobile Broadband Wireless Access.” IEEE Wireless Communications Magazine, 16(5):36-42.
Jacobson, V. (1988). “Congestion Avoidance and Control.” Proc. ACM SIGCOMM.
Kurose, J. F. and Ross, K. W. (2020). Computer Networking: A Top-Down Approach, 8th ed. Pearson.
Lamport, L. (1978). “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM, 21(7):558-565.
McKeown, N., Anderson, T., Balakrishnan, H., et al. (2008). “OpenFlow: Enabling Innovation in Campus and Enterprise Networks.” ACM SIGCOMM Computer Communication Review, 38(2):69-74.
NGMN (Next Generation Mobile Networks) Alliance (2015). “5G White Paper.” NGMN.
Open RAN Alliance (2022). “Open RAN Explained.” O-RAN Alliance White Paper.
Richter, F., Fehske, A. J., and Fettweis, G. P. (2009). “Energy Efficiency Aspects of Base Station Deployment.” Proc. IEEE VTC 2009 Fall.
Saltzer, J. H., Reed, D. P., and Clark, D. D. (1984). “End-to-End Arguments in System Design.” ACM Trans. Computer Systems, 2(4):277-288.
Shannon, C. E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27(3):379-423.
Wiener, N. (1948). Cybernetics: Or Control and Communication in the Animal and the Machine. MIT Press.

This chapter is part of “A First-Principles Approach to Networked Systems” by Arpit Gupta, UC Santa Barbara, licensed under CC BY-NC-SA 4.0.