The Streaming Control Loop: Buffering, DASH, ABR

Last lecture identified the Time invariant as the anchor for multimedia: when must data arrive? Netflix has no deadline — the movie was recorded months ago, the user presses play whenever they want. This “no deadline” luxury enables a massive client-side buffer (60–200 seconds of video held in reserve) and the use of HTTP/TCP for delivery (firewalls pass it, lost packets are retransmitted, the buffer absorbs the delay). The architecture that emerged is DASH: the video is divided into short segments (2–10 seconds each), each available at multiple bitrates via a manifest file, and the client downloads one segment at a time, choosing which bitrate to request based on current network conditions.

We ended L10 with this closed-loop system working — but with a warning. The client’s bitrate choice depends on predicting the network’s future behavior from past observations. When that prediction fails: the client oscillates between quality levels, the buffer drains and playback freezes, or multiple clients on the same link interfere with each other. We named these failure modes but deferred the solutions.

Today we unpack the control loop itself. The question is: given that DASH works and the buffer is our shock absorber, how should the client choose the next bitrate — and what happens when it gets it wrong? This turns out to be a surprisingly deep control problem, and the research community spent a decade on it. We trace four generations of ABR algorithms, each motivated by what broke in the previous one:

Throughput-based ABR (the naive first attempt) — estimate bandwidth, pick the matching bitrate.
Buffer-Based Adaptation (BBA) — Huang et al. 2014 [1] — stop estimating bandwidth; use buffer level as the sensor instead.
Model Predictive Control (MPC) — Yin et al. 2015 [2] — use both buffer and throughput, but optimize over a horizon.
Neural ABR (Pensieve) — Mao et al. 2017 [3] — replace hand-designed control with a learned policy.

Each generation is motivated by a specific failure of the previous one. But first: what are we optimizing FOR?

The objective: what users actually care about

Before diving into algorithms, we need to define the goal. What does “good streaming” mean to the user? The research (Dobrian et al. 2011 [6]) converges on three findings that establish a strict priority order:

Finding 1: Rebuffering is catastrophic. A single mid-stream freeze — even 1-2 seconds — causes a dramatic increase in user abandonment. A 1% increase in rebuffering ratio leads to a 3% increase in abandonment [6]. Users tolerate startup delay (they expect a loading indicator), but a freeze 20 minutes into a movie feels like the service is broken.

Finding 2: Quality stability matters more than peak quality. Users prefer a steady 720p stream over one that oscillates between 1080p and 480p every 10 seconds, even though the oscillating stream has higher average bitrate. Visible quality changes are jarring [6].

Finding 3: Higher quality is better — but only after the first two are satisfied. Given no freezes and stable quality, users prefer 1080p over 720p. But this is the lowest priority.

The QoE hierarchy — the objective every ABR algorithm must satisfy:

Avoid rebuffering (highest priority)
Minimize quality switches (second priority)
Maximize video quality (third priority — subject to the first two)

Key takeaway: If you remember one thing from this lecture, it is this priority order: no freezes > stable quality > high quality. A 720p stream that never stalls beats a 4K stream that freezes twice. Every ABR algorithm we study today is an attempt to satisfy this hierarchy.

Act 1: Throughput-based ABR — the obvious first attempt

The algorithm

When DASH was first deployed (circa 2011-2012), the ABR algorithm in most players was simple [4][5]:

Download segment N. Measure how long it took. Compute throughput: bytes received / download time.
Smooth the estimate using an exponential moving average (the same technique TCP uses for RTT estimation): estimated_throughput = 0.1 × latest_measurement + 0.9 × previous_estimate. This gives more weight to recent history while discounting old data.
Pick the highest bitrate from the ladder that is safely below the estimated throughput — typically at most 85% of the estimate (e.g., if estimated throughput is 10 Mbps, request at most 8.5 Mbps).
Download the next segment at that bitrate.

This is a myopic algorithm — it looks at one measurement at a time, makes one decision, and moves on. The feedback loop has a period of one segment (typically 2-10 seconds).

Why it seemed reasonable

The algorithm mirrors what you would do intuitively. If you downloaded the last chunk at 5 Mbps, you probably have roughly 5 Mbps available, so request a 4 Mbps chunk (leaving some margin). It works well on stable wired connections where bandwidth changes slowly.

What broke: three failure modes

Failure mode 1: Overestimation on volatile networks. On WiFi, cellular, or shared broadband, throughput fluctuates rapidly. The smoothed estimate lags behind reality. The client requests a 5 Mbps segment when the network has already dropped to 1 Mbps. The download takes 5x longer than expected. The buffer drains. If the buffer empties before the download finishes — the video stalls [1][4].

Let’s trace the arithmetic. Suppose the buffer holds 10 seconds of video and a segment is 4 seconds long. At the requested bitrate of 5 Mbps, the segment is 5 × 4 = 20 Mbits. At the actual throughput of 1 Mbps, the download takes 20 seconds. During those 20 seconds, the buffer drains from 10 seconds to zero — the video freezes at second 10 of the download. The user sees a rebuffering spinner for 10 more seconds.

Failure mode 2: Oscillation. This is the oscillation problem we flagged at the end of L10 — now we see exactly how it happens. Suppose the available bitrate ladder has steps at 3 and 5 Mbps. The client measures 4 Mbps, picks the 3 Mbps bitrate (highest below 4 with margin). The segment downloads quickly (the smaller file finishes fast on the 4 Mbps link), so the measured throughput appears even higher — say 5 Mbps. The client jumps to 5 Mbps. Now the segment is larger, download takes longer, and the measured throughput drops. The client falls back to 3 Mbps. Repeat. The quality swings visibly between high and low every few segments — a jarring experience [1][5].

This is a classic feedback instability. The control action (choosing a higher bitrate) changes the measurement (a larger file takes longer to download and looks like lower throughput), which triggers the opposite action. The loop feeds back into itself.

Failure mode 3: Underutilization. To avoid the first two problems, engineers made the safety margin even more aggressive — some implementations request only 60% of estimated throughput (down from the typical 85%). But this means the client systematically chooses a lower bitrate than the network can sustain. On a 10 Mbps link, the client requests 6 Mbps when 8 Mbps would play smoothly. The user gets worse quality than they should. You are wasting the network’s capacity [1][4].

The core problem

All three failure modes trace to the same root cause: past throughput is an unreliable predictor of future throughput [1][4]. The client measures what happened during the last 4-second segment download. It needs to predict what will happen during the next 4-second download. On a network where bandwidth can swing by 10x in seconds — WiFi interference, cellular handoffs, a roommate starting a large download — the prediction is often wrong.

This is the same challenge TCP congestion control faces (L3): the sender must guess the network’s available capacity from delayed, indirect measurements. But DASH has it worse. TCP adjusts every RTT (50-100 ms). DASH adjusts every segment (2-10 seconds). The feedback loop is 100x slower.

Act 2: Buffer-Based Adaptation (BBA) — Huang et al. 2014

The insight: stop predicting, start observing

In 2014, Te-Yuan Huang and colleagues at Stanford, working with Netflix, published a remarkably simple idea: ignore the throughput estimate entirely and use the buffer level as the sole input to the ABR algorithm [1].

Why buffer level? Because the buffer integrates all the information you care about. If the buffer is growing, the network is delivering data faster than the video is consuming it — regardless of what the measured throughput says. If the buffer is shrinking, the opposite is true. The buffer is a physical sensor of the network’s actual impact on the ongoing playback experience (though at startup, when the buffer is empty, this sensor has no signal — a weakness we will address below).

Think of it as a bathtub. Water flows in (downloaded video data) and drains out (playback consumption at a constant rate). You don’t need to measure the faucet’s flow rate. You just look at the water level. If it’s rising, you can afford more. If it’s falling, cut back.

The rate map

BBA defines a simple mapping — the rate map — from buffer level B (in seconds) to bitrate [1]:

If B < B_min (e.g., 10 seconds): request the lowest bitrate. The buffer is dangerously low. Survival mode: fill the buffer as fast as possible.
If B > B_max (e.g., 60 seconds): request the highest bitrate. The buffer is full. The network has proven it can sustain high quality. Enjoy it.
If B_min < B < B_max: linearly interpolate between the lowest and highest bitrate based on where B falls in the range.

The mapping is monotonically increasing: more buffer implies higher quality, always. And there is no throughput estimation anywhere in the algorithm.

Why it works

Stability. Consider what happens when the network drops. The buffer starts draining. As B falls, the rate map selects a lower bitrate. The lower bitrate downloads faster, the buffer stops draining. Equilibrium. Oscillation is far less likely, because the algorithm has a single monotone mapping with a wide hysteresis range (B_min to B_max) — the buffer must traverse a large range before the bitrate changes significantly.

No prediction needed. BBA makes no assumption about future throughput. It reacts to what has already happened (the buffer level reflects the cumulative history of downloads vs. playback). This eliminates the prediction errors that caused throughput-based ABR to overestimate or underestimate [1].

Netflix deployment. Huang et al. deployed BBA on Netflix — at the time serving roughly one-third of U.S. internet traffic — and observed a 10-20% reduction in rebuffering events compared to throughput-based ABR, with no measurable decrease in average video quality [1].

What broke: the startup problem and the slow ramp

BBA has a fundamental weakness, and it surfaces at the moment users care most: startup [1][2].

When the user presses play, the buffer is empty. B = 0. According to the rate map, the algorithm must request the lowest bitrate — 145 kbps, blocky and blurry. Even as the network delivers data quickly and the buffer begins filling, BBA raises quality slowly, because the rate map is linear and the buffer must traverse the entire range from B_min to B_max. On a fast network (20 Mbps), the buffer might fill to 60 seconds within 10-15 seconds, but during those first 10-15 seconds, the user sees progressively improving but still mediocre quality.

This is the cold start problem — a “broken invariant” in the language of the BBA paper [1]. The buffer level encodes information about past network conditions, but at startup there is no past. The sensor has no signal. BBA must fall back to a conservative default until the buffer accumulates enough history.

Compare this to throughput-based ABR: it measures the first segment download, gets a throughput estimate, and immediately jumps to a matching bitrate. The first 4 seconds might be low quality, but the second segment is already at the right level. Throughput-based ramps up faster.

The second weakness: waste on fast links

On a stable, high-bandwidth connection (say a wired 100 Mbps link), BBA works but wastes time. The network can sustain the highest bitrate immediately, but BBA insists on filling the buffer through the linear ramp. The user spends 30-60 seconds watching quality slowly improve when it could have been perfect from the start [2].

Throughput-based ABR, despite its instability, gets the right answer faster on stable networks. BBA is more robust but more conservative. The question is: can you get both?

Act 3: Model Predictive Control (MPC) — Yin et al. 2015

The insight: combine both signals, optimize over a horizon

In 2015, Xiaoqi Yin and colleagues asked: throughput-based ABR uses throughput but ignores the buffer. BBA uses the buffer but ignores throughput. What if you used both — and optimized over multiple segments instead of just the next one? [2]

This is Model Predictive Control (MPC), a technique from control theory (used in chemical engineering, robotics, autonomous vehicles) applied to video streaming. The idea:

Model the system: the buffer fills when data arrives and drains at playback rate. Future buffer level is a function of current buffer, predicted throughput, and chosen bitrate.
Predict future throughput using past measurements — typically using the harmonic mean of the last 5 segments as a conservative estimate (harmonic mean penalizes low values more than arithmetic mean, providing a safer lower bound).
Optimize bitrate choices over the next K segments (typically K = 5, spanning ~50 seconds) to maximize a QoE objective function.
Act on only the first decision. Then re-predict and re-optimize. This is the “receding horizon” — the same strategy used in self-driving cars.

The QoE objective function

What does “optimize QoE” mean concretely? Yin et al. defined a weighted sum of three terms [2]:

**QoE = sum over K segments of: [ quality(bitrate) − μ × rebuffer_time − λ ×

quality_switch

]**

Where:

quality(bitrate) measures perceptual quality, not raw kbps. Quality follows diminishing returns: quality(145k) = 1, quality(771k) = 3, quality(2358k) = 4, quality(5800k) = 4.5. The jump from 145 kbps to 771 kbps is huge; from 2,358 to 5,800 is modest.
rebuffer_time is the number of seconds the video would freeze while waiting for a segment to finish downloading (zero if the buffer doesn’t empty, positive if it does). Weighted by μ (a large penalty — rebuffering is catastrophic).
** quality_switch ** is the magnitude of quality change between consecutive segments. Weighted by λ (moderate penalty — visible quality swings are annoying but not as bad as stalling).

This function encodes a precise claim about human perception: stalling is worse than low quality is worse than quality oscillation. The weights μ and λ are tuned from user studies (Dobrian et al. 2011 [6]).

How the optimization works

Here is a concrete example. Suppose:

Current buffer: 15 seconds.
Recent throughput: [4, 4.5, 3.8, 4.2, 4.0] Mbps. Harmonic mean: ~4.08 Mbps.
Available bitrates: [0.5, 1.0, 2.0, 3.0, 5.0, 8.0] Mbps.
Segment duration: 4 seconds.
Prediction: assume ~4 Mbps for the next 5 segments.

A greedy algorithm (throughput-based) would pick 3.0 Mbps for every segment (the highest below 4 Mbps with a safety margin).

MPC considers all 6^5 = 7,776 possible bitrate sequences over 5 segments. For each sequence, it simulates the buffer evolution:

Segment 1 at 3 Mbps: segment size = 3 Mbps × 4 s = 12 Mbits. At predicted throughput 4 Mbps, download takes 12 / 4 = 3 seconds. During those 3 seconds, 3 seconds of video are consumed from the buffer. Buffer goes from 15 to 15 + 4 − 3 = 16 seconds (4 seconds of new video added, 3 seconds of playback consumed). Quality: quality(3M).

Segment 2 at 5 Mbps: segment size = 5 × 4 = 20 Mbits. Download takes 20 / 4 = 5 seconds. Buffer goes from 16 to 16 + 4 − 5 = 15 seconds. Quality: quality(5M). Switch penalty:

quality(5M) − quality(3M)

And so on.

MPC finds the sequence that maximizes total QoE. The optimal sequence might be: [3, 3, 3, 3, 3] (stable, no switches) or [2, 3, 3, 3, 5] (sacrifice one segment to grow buffer, then climb). The key: it can make a temporarily suboptimal choice (lower bitrate now) to gain buffer headroom for a higher bitrate later. A greedy algorithm cannot do this [2].

Why MPC is better than both predecessors

Better than throughput-based: MPC uses throughput predictions but also accounts for the buffer as a safety margin. Even if the prediction is wrong, the buffer term in the simulation prevents the algorithm from choosing a bitrate that would empty the buffer.

Better than BBA: MPC can ramp up quality faster during startup, because it uses throughput estimates (which are available immediately) rather than waiting for the buffer to fill. It gets BBA’s stability (because the buffer is modeled) plus throughput-based ABR’s responsiveness (because throughput is estimated) [2].

Quantitative results: Yin et al. showed MPC improved average QoE by 20-30% over BBA and throughput-based algorithms across traces from real cellular networks, with fewer rebuffering events and fewer quality switches [2].

What broke: the prediction bottleneck

MPC is only as good as its throughput prediction. The algorithm assumes: “I can estimate future throughput accurately enough to plan 5 segments ahead.”

On cellular networks, this assumption fails catastrophically. A user driving through a city experiences throughput swings from 20 Mbps (strong LTE signal) to 500 kbps (entering a tunnel) within seconds. The harmonic mean of the past 5 segments tells you nothing about the next 5 [2][3].

When the prediction is wrong, MPC’s optimization produces a plan based on a fictitious future. The first action in that plan — the only one actually executed — may be the wrong choice. MPC re-plans every segment, so it can recover, but the damage of one bad prediction (a stall event) is immediate and visible to the user.

The deeper problem: you cannot hand-engineer a throughput predictor that works across all network conditions. Cellular, WiFi, wired broadband, congested shared links, VPN tunnels — each has different dynamics. Any fixed model (harmonic mean, autoregressive, etc.) is tuned for some conditions and wrong for others [3].

Act 4: Neural ABR (Pensieve) — Mao et al. 2017

The insight: learn the policy from experience

In 2017, Hongzi Mao and colleagues at MIT asked: instead of hand-designing the ABR control logic (rate map, optimization objective, throughput predictor), can you learn it directly from data? [3]

Pensieve uses reinforcement learning (RL). The idea:

State: The current buffer level, the throughput measurements of the past 8 segments, the download time of the last segment, the sizes of the next segment at each bitrate level, and the number of remaining segments in the video.
Action: Choose a bitrate for the next segment.
Reward: The same QoE function MPC uses — quality minus rebuffering penalty minus switch penalty.
Learning: Train a neural network (the “policy”) on thousands of simulated streaming sessions, using traces of real network throughput from cellular, WiFi, and broadband measurements. The network learns to map states to actions that maximize long-run reward.

The trained policy replaces the entire hand-designed ABR algorithm. No rate map. No optimization solver. No throughput predictor. Just: observe the current state, feed it to the neural network, get a bitrate decision.

Why learning helps

Implicit prediction. The neural network never explicitly computes a throughput forecast. But the network traces it trained on contain patterns — cellular throughput tends to drop after a high spike, wired connections are relatively stable. The policy has implicitly learned these patterns and makes decisions that account for them [3].

Environment-specific tuning. A policy trained on cellular traces behaves differently from one trained on wired traces. Pensieve can train specialized policies for different network environments — or train a general policy on mixed traces that handles all of them.

No hand-tuned weights. The QoE weights (μ, λ) are still specified by the designer, but the mapping from QoE to bitrate decisions is learned, not engineered. This eliminates the fragile interaction between the predictor, the optimizer, and the rate map that makes MPC sensitive to design choices [3].

Quantitative results

Mao et al. evaluated Pensieve against MPC, BBA, and throughput-based ABR across a range of network traces (FCC broadband, Norway cellular, synthetic). Pensieve achieved 12-25% higher average QoE than MPC, primarily by reducing rebuffering events on highly variable traces while maintaining quality on stable ones [3].

What broke: the generalization question

Pensieve is not the end of the story. Three problems remain open:

Generalization. A policy trained on Norway cellular traces may not work well on Indian 4G networks with different variability patterns. The model learns the specific distribution of its training data. If deployment conditions differ, performance degrades [3]. This is the standard machine learning problem: training distribution must match deployment distribution.

Interpretability. MPC’s decisions are explainable: “I chose 3 Mbps because my throughput estimate is 4 Mbps, my buffer is 12 seconds, and the optimization says this maximizes QoE over the next 5 segments.” Pensieve’s decisions are opaque: “the neural network output was 3 Mbps.” When something goes wrong, you cannot diagnose why.

Fairness. Like all client-side ABR algorithms, Pensieve optimizes for one client in isolation. When multiple Pensieve clients share a bottleneck link, their learned policies may not converge to a fair allocation. Each client’s neural network was trained assuming it is the only one on the network. Cross-client interference — the same problem we flagged in L10 — remains unsolved [5].

Cross-client interference: the multi-agent problem

So far, we have discussed ABR as a single-client problem. But in any real deployment, multiple clients share bandwidth. What happens when each client independently runs its own ABR loop? [5][7]

The failure mode

Consider three Netflix clients on a shared 15 Mbps home broadband connection. Each independently measures throughput.

Initially: Each client gets ~5 Mbps. Each requests 3.5 Mbps video (below 5 Mbps with safety margin). Total demand: 10.5 Mbps. Spare capacity: 4.5 Mbps. All buffers grow.
Ramp up: All three clients see high throughput and growing buffers. All three independently decide to request 5 Mbps video. Total demand: 15 Mbps. No spare capacity. TCP’s congestion control distributes fairly — each gets 5 Mbps, just enough.
Perturbation: One client’s download finishes slightly early, freeing bandwidth. The other two see a brief throughput spike, estimate higher capacity, and request 8 Mbps video. Total demand: 21 Mbps on a 15 Mbps link.
Collapse: All three downloads slow dramatically. Buffers drain. All three panic and drop to 1 Mbps. Total demand: 3 Mbps. Massive underutilization.
Recovery: With only 3 Mbps demanded, each client sees 5 Mbps throughput again. Buffers grow. Repeat from step 2.

This is synchronized oscillation — a multi-agent version of the single-client oscillation from throughput-based ABR. All clients swing between high and low quality in lockstep [5].

Why it happens

Each ABR client is a feedback controller that treats the network as its environment. But the “environment” includes the other clients. Each client’s action (bitrate choice) changes the environment for every other client (available bandwidth). No client accounts for this — each optimizes as if it were alone. This is the same problem as multiple TCP flows competing for a bottleneck (L3), but with a 100x slower control loop.

Partial solutions

Hysteresis (BBA’s approach): use different thresholds for ramping up vs. ramping down. If the threshold to increase quality is higher than the threshold to decrease, clients are less likely to oscillate in sync [1].

Randomized backoff (borrowed from CSMA/CA): add a random delay to bitrate increase decisions. Different clients ramp up at different times, avoiding synchronized steps [5].

Server-side coordination (breaks the client-driven model): the CDN or ISP could track all active clients on a link and allocate bandwidth, similar to a cellular scheduler. This requires per-client state at the server — exactly the complexity DASH was designed to avoid [5][7].

None of these fully solves the problem. Cross-client ABR fairness remains an active research area.

The generational arc: from naive to learned

Generation	Algorithm	What it uses	What it ignores	What breaks
1 (2011)	Throughput-based	Past throughput	Buffer level	Oscillation, overestimation, underutilization
2 (2014)	BBA (Huang) [1]	Buffer level	Throughput	Slow startup, wasted bandwidth on fast links
3 (2015)	MPC (Yin) [2]	Both + lookahead	Cross-client effects	Prediction errors on volatile networks
4 (2017)	Pensieve (Mao) [3]	Learned policy from data	— (learns everything from traces)	Generalization, interpretability, fairness

Each generation solved the previous generation’s failure mode. Each introduced a new one. The trajectory mirrors every arc in this course:

Medium access: ALOHA (no info) -> CSMA (sense channel) -> CSMA/CA (ACK feedback) -> OFDMA (centralized scheduling).
TCP congestion control: AIMD (reactive) -> Vegas (delay-based) -> BBR (model-based).
ABR: throughput (one signal) -> BBA (different signal) -> MPC (both + optimization) -> Pensieve (learned).

The pattern: more information at each step, more sophisticated use of that information, better performance — but new failure modes at the boundary of the model’s assumptions.

What’s actually deployed — and what broke next

The sobering reality: Puffer (2020)

The most important result in ABR research since Pensieve came not from a new algorithm but from a real-world experiment. Francis Yan and colleagues at Stanford built Puffer — a live TV streaming platform that randomly assigned real viewers to different ABR algorithms and measured their actual experience over 38.6 years of cumulative video to 63,508 participants [10].

The result was sobering: MPC and Pensieve did not consistently outperform BBA in the real world [10]. The algorithms that dominated in simulation — trained on curated network traces, evaluated on controlled testbeds — failed to maintain their advantage when confronted with the full diversity of real networks. The only algorithm to consistently beat BBA was Fugu, an online-learning approach that updated its neural network continuously using real throughput data rather than relying on pre-trained policies.

This finding catalyzed a rethinking of ABR research. The problem was not just “which algorithm is best?” but “can we trust our evaluations at all?”

Three research directions that followed

1. Fixing the evaluation methodology. Alomar et al. (MIT, NSDI 2023 Best Paper) showed that trace-driven ABR simulation is fundamentally biased: traces collected under one ABR policy don’t accurately predict how a different policy would perform. Their system, CausalSim, uses causal inference to remove this bias, reducing simulation error by 53-61% [11].

2. Making learned ABR robust. Patel et al. (CoNEXT 2024 Best Paper) identified that Pensieve-style RL training suffers from skewed input distributions — the training traces don’t represent the full range of real conditions. Their system Gelato/Plume fixes this, and was evaluated over 59 stream-years of real data [12]. Separately, ABR-Arena (2025) tested RL-based ABR across global locations and found that algorithms excelling on Puffer had stall ratios 265-277% higher in other environments — the generalization problem remains unsolved [13].

3. Production-scale ABR with guarantees. SODA (Chen et al., SIGCOMM 2024), developed with Amazon Prime Video, uses Lyapunov optimization to provide theoretical performance bounds — something no prior algorithm (BBA, MPC, or Pensieve) offered. It achieves 10-28% QoE improvement and is deployed at Amazon scale [14].

Beyond ABR: what Netflix actually optimized

While researchers focused on client-side ABR algorithms, Netflix invested heavily on the server side — optimizing what the ABR algorithm chooses from. Their Dynamic Optimizer computes per-shot encoding parameters, allocating more bits to visually complex scenes and fewer to simple ones. Combined with AV1 adoption (now powering 30% of Netflix streaming as of 2025, delivering 45% fewer buffering interruptions than H.264 [15]), the encoding ladder itself has become content-adaptive. A well-designed encoding ladder can make even a simple BBA algorithm perform excellently — the best ABR algorithm in the world cannot compensate for a poorly encoded bitrate ladder.

What this means for the field

The ABR story is not “Pensieve solved it.” It is:

BBA (2014) solved the stability problem with a simple, deployable insight.
MPC (2015) showed how to optimize formally, but its predictions failed on volatile networks.
Pensieve (2017) replaced hand-engineering with learning, but didn’t generalize.
Puffer (2020) showed none of the lab champions reliably won in the real world.
The current frontier is split: fix evaluation (CausalSim), fix training (Gelato), provide guarantees (SODA), or optimize the encoding ladder instead (Netflix).

The lesson parallels medium access: the “best” algorithm depends on the deployment environment, and real-world performance often surprises researchers who optimized in simulation.

Connecting backward and forward

Backward: the ABR loop is a feedback controller

The ABR control loop is structurally identical to TCP congestion control and CSMA/CA:

	TCP (L3)	CSMA/CA (L6)	ABR (L10-L11)
What is controlled?	Sending rate	Transmission timing	Video bitrate
What is observed?	ACKs, RTT, loss	Channel busy/idle, ACKs	Chunk download time, buffer level
Loop period	~1 RTT (50-100 ms)	~slot time (9 us)	~1 segment (2-10 s)
Overreaction failure	Throughput collapse	Collision storm	Quality oscillation
Underreaction failure	Wasted bandwidth	Idle channel	Unnecessary low quality

The same tradeoff between responsiveness and stability appears in all three. The same tools — smoothing, hysteresis, prediction, optimization — are applied in all three. The ABR loop is slower and coarser, but the structure is identical.

Forward: L12 — when buffering is forbidden

Today’s entire lecture assumed a fundamental luxury: the buffer. DASH clients hold 30-200 seconds of video. The buffer absorbs network variability, gives the ABR algorithm time to react, and makes stalls survivable.

What happens when you cannot buffer? A Zoom call requires end-to-end delay below 150 milliseconds [8]. A 30-second buffer adds 30 seconds of delay — completely unusable for conversation. The jitter buffer in VoIP holds 50-200 milliseconds — barely enough to smooth out packet-level variability, far too small for segment-level adaptation.

L12 asks: when the time constraint eliminates the buffer, what tools remain? How does the receiver reconstruct smooth audio from packets that arrive irregularly? How does the sender learn about network conditions when there is no time for feedback? We will study jitter buffers, RTP/RTCP in depth, adaptive playout, and modern real-time systems — WebRTC [9] and Zoom — that push the time constraint to its extreme.

Generative exercise

Design challenge (choose one):

Option A: Design a bitrate ladder. You are encoding a 2-hour action movie for a streaming service. Your users have connections ranging from 500 kbps (rural cellular) to 50 Mbps (fiber). Design a bitrate ladder: how many quality levels? What bitrates? What resolution at each level? Justify your spacing — why these specific steps?

Consider:

Diminishing returns in quality above ~5 Mbps for 1080p.
The lowest rung must be watchable (not just “technically plays”).
Each step must offer a perceptible improvement — if users cannot tell the difference between two rungs, you are wasting encoding and storage cost.
Action content compresses less efficiently than a lecture (more motion = more bits for the same quality).

Option B: Diagnose an ABR failure. A user reports: “My Netflix kept freezing every 30 seconds for about 2 seconds, then recovering. My internet speed test shows 10 Mbps.” Walk through the ABR loop step by step. What could cause periodic rebuffering on a link that appears fast? Consider: shared bottleneck (roommate downloading), throughput measurement timing (speed test vs. segment download), ABR algorithm choice (throughput-based vs. BBA vs. MPC), and segment duration.

References

[1] Huang, T.-Y., Johari, R., McKeown, N., Trunnell, M., and Watson, M. (2014). “A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service.” Proc. ACM SIGCOMM.

[2] Yin, X., Jindal, A., Sekar, V., and Sinopoli, B. (2015). “A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP.” Proc. ACM SIGCOMM.

[3] Mao, H., Netravali, R., and Alizadeh, M. (2017). “Neural Adaptive Video Streaming with Pensieve.” Proc. ACM SIGCOMM.

[4] A. Gupta, A First-Principles Approach to Networked Systems, Ch. 7: Multimedia Applications, UC Santa Barbara, 2026.

[5] Akhshabi, S., Begen, A. C., and Dovrolis, C. (2011). “An Experimental Evaluation of Rate-Adaptation Algorithms in Adaptive Streaming over HTTP.” Proc. ACM Multimedia Systems.

[6] Dobrian, F. et al. (2011). “Understanding the Impact of Video Quality on User Engagement.” Proc. ACM SIGCOMM.

[7] Kurose, J. F. and Ross, K. W. (2021). Computer Networking, 8th Edition. Pearson.

[8] ITU-T Recommendation G.114 (2003). “One-way transmission time.” International Telecommunication Union.

[9] Alvestrand, H. (2021). “Overview: Real-Time Protocols for Browser-Based Applications.” RFC 8825.

[10] Yan, F., Ayers, H., Zhu, C., Fouladi, S., Hong, J., Zhang, K., Levis, P., and Winstein, K. (2020). “Learning in situ: A Randomized Experiment in Video Streaming.” Proc. USENIX NSDI.

[11] Alomar, A., Hamadanian, P., Nasr-Esfahany, A., Agarwal, A., Alizadeh, M., and Shah, D. (2023). “CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation.” Proc. USENIX NSDI (Best Paper).

[12] Patel, S., Zhang, J., Narodytska, N., and Abdu Jyothi, S. (2024). “Gelato: Practically High Performant Neural Adaptive Video Streaming.” Proc. ACM CoNEXT (Best Paper).

[13] Hoffman, B. et al. (2025). “Into the Wild: Real-World Testing for ML-Based ABR.” PACMI Workshop.

[14] Chen, T., Lin, Y., Christianson, N., Akhtar, Z., Dharmaji, S., Hajiesmaili, M., Wierman, A., and Sitaraman, R. (2024). “SODA: An Adaptive Bitrate Controller for Consistent High-Quality Video Streaming.” Proc. ACM SIGCOMM.

[15] Netflix Technology Blog (2025). “AV1 — Now Powering 30% of Netflix Streaming.”