The Streaming Control Loop

CS176C — Advanced Topics in Internet Computing

Arpit Gupta

2026-05-12

Where We Left Off

L10: DASH gives the client a closed-loop control system — one bitrate decision per segment.

Recall: Video chopped into 2–10 second segments, each at multiple bitrates. Client downloads one at a time, choosing which bitrate based on conditions.

Today’s question: How should the client choose — and what happens when it gets it wrong?

What Are We Optimizing For?

The QoE Hierarchy

Two Netflix Sessions, Same Average Bitrate

Session A: Steady 720p for 2 hours. Never freezes.

Session B: Oscillates 1080p ↔︎ 480p every 10 seconds. Freezes twice.

Which does the user prefer?

A — overwhelmingly. Dobrian et al. (2011), millions of real sessions:

1% more rebuffering → 3% more abandonment
Users tolerate 3–5 seconds of startup delay
But a 1-second freeze at minute 20 feels broken

The QoE Hierarchy

Every ABR algorithm must satisfy this priority order:

No freezes — highest priority. A stall is catastrophic.

Stable quality — second priority. Oscillation is jarring.

High quality — third priority. Subject to the first two.

A 720p stream that never stalls beats a 4K stream that freezes twice.

Every algorithm today is an attempt to satisfy this hierarchy.

Act 1: Throughput-Based ABR (2011)

The Obvious First Attempt

The Algorithm

Download segment. Measure throughput.
Smooth: estimated = 0.1 × latest + 0.9 × previous
Pick highest bitrate safely below estimate (~85%)
Repeat every 2–10 seconds

Question: What could go wrong?

(Think for 30 seconds before we continue.)

Three Ways It Breaks

1. Overestimation: Network drops 5 → 1 Mbps. Client requests 5 Mbps segment (20 Mbits). At 1 Mbps: download = 20 seconds. Buffer (10s) empties at second 10. Stall.

2. Oscillation: Picks 3 Mbps → fast download → measures “5 Mbps” → picks 5 Mbps → slow download → measures “3 Mbps” → repeat. Quality swings every segment.

3. Underutilization: Safety margin set to 60%. On 10 Mbps link, requests 6 Mbps. Wastes 40% of capacity.

Root cause: Past throughput ≠ future throughput. The prediction is often wrong.

TCP faces the same challenge (L3) — but adjusts every 50–100 ms. DASH adjusts every 2–10 seconds. 100× slower loop.

Act 2: BBA — Stop Predicting (2014)

Huang et al., deployed at Netflix

The Bathtub Insight

BBA’s idea: Ignore throughput entirely. Use buffer level as the only input.

Think of the buffer as a bathtub:

Water flows in (downloads), drains out (playback at constant rate)
Don’t measure the faucet — just look at the water level

The rate map:

Buffer < 10s → lowest bitrate (survival mode)
Buffer > 60s → highest bitrate (earned it)
In between → interpolate linearly

No prediction. No estimation error. Netflix result: 10–20% fewer stalls.

What Broke: The Cold Start

Question: User presses play. Buffer = 0. What does BBA do?

Requests 145 kbps — blocky and blurry. Even on a 20 Mbps network.

Must fill through linear ramp: 10–15 seconds of mediocre quality.

	Throughput-based	BBA
Startup	Fast (1 segment)	Slow (10–15s)
Stability	Oscillates	Stable

Can we get both? → That’s exactly what the next two generations tried.

Beyond BBA: Two More Generations

MPC (2015) and Pensieve (2017) — Summarized

MPC and Pensieve: Combining Signals

	Throughput-based	BBA (2014)	MPC (2015)	Pensieve (2017)
Uses	Past throughput	Buffer level	Both + lookahead	Learned policy
Strength	Fast startup	Stable, no prediction	Plans ahead, formal QoE optimization	Adapts to any network pattern
Weakness	Oscillation	Slow startup	Prediction errors on volatile networks	Doesn’t generalize across environments
Key idea	Measure → match	Observe → react	Model → predict → optimize	Train on traces → deploy

MPC formalizes the QoE hierarchy as an optimization: quality − μ × stalls − λ × switches. Evaluates all bitrate sequences over 5 segments. Picks the best. Re-plans every segment.

Pensieve replaces the hand-engineered optimizer with a neural network trained via reinforcement learning. 12–25% QoE gain over MPC in simulation.

But do they work in the real world?

The Sobering Reality: Puffer (2020)

Stanford. 63,508 real viewers. 38.6 years of video. Random algorithm assignment.

MPC and Pensieve did not consistently outperform BBA in the real world.

Lab champions ≠ field champions. The only consistent winner: Fugu — online learning updated continuously from real data, not pre-trained policies.

What happened next:

CausalSim (NSDI ’23 Best Paper): trace-driven simulation is biased — prior comparisons may have been wrong
SODA (SIGCOMM ’24, Amazon Prime Video): ABR with theoretical guarantees, deployed at scale
Netflix: optimized the encoding ladder instead — per-shot encoding + AV1 (30% of streams, 45% fewer stalls)

Lesson: Sometimes optimizing the input (what ABR chooses from) matters more than optimizing the controller (ABR itself).

The Generational Arc

Gen	Algorithm	Uses	Breaks
2011	Throughput-based	Past throughput	Oscillation
2014	BBA	Buffer level	Slow startup
2015	MPC	Both + lookahead	Prediction errors
2017	Pensieve	Learned policy	Generalization
2020	Puffer/Fugu	Online learning	Still open

Same arc as medium access: ALOHA → CSMA → CSMA/CA → OFDMA.

More information → better control → new failure at the boundary.

Same structure as TCP (L3) and CSMA/CA (L6): observe → estimate → decide → act. ABR is the slowest loop (2–10s vs. TCP’s 50–100ms vs. CSMA/CA’s 9μs).

What Comes Next

L12 (Thursday): Take the buffer away entirely.

Netflix adapts every 10 seconds. Zoom must adapt every millisecond.

150 ms total budget. Everything from today — large buffers, 10-second segments, per-segment ABR — is unusable.

What remains when buffering is forbidden?