The Streaming Control Loop

CS176C — Advanced Topics in Internet Computing

Arpit Gupta

2026-05-12

Where We Left Off

L10: DASH gives the client a closed-loop control system — one bitrate decision per segment.

Recall: Video chopped into 2–10 second segments, each at multiple bitrates. Client downloads one at a time, choosing which bitrate based on conditions.

Today’s question: How should the client choose — and what happens when it gets it wrong?

What Are We Optimizing For?

The QoE Hierarchy

Two Netflix Sessions, Same Average Bitrate

Session A: Steady 720p for 2 hours. Never freezes.

Session B: Oscillates 1080p ↔︎ 480p every 10 seconds. Freezes twice.

Which does the user prefer?

A — overwhelmingly. Dobrian et al. (2011), millions of real sessions:

  • 1% more rebuffering → 3% more abandonment
  • Users tolerate 3–5 seconds of startup delay
  • But a 1-second freeze at minute 20 feels broken

The QoE Hierarchy

Every ABR algorithm must satisfy this priority order:

  1. No freezes — highest priority. A stall is catastrophic.
  1. Stable quality — second priority. Oscillation is jarring.
  1. High quality — third priority. Subject to the first two.

A 720p stream that never stalls beats a 4K stream that freezes twice.

Every algorithm today is an attempt to satisfy this hierarchy.

Act 1: Throughput-Based ABR (2011)

The Obvious First Attempt

The Algorithm

  1. Download segment. Measure throughput.
  2. Smooth: estimated = 0.1 × latest + 0.9 × previous
  3. Pick highest bitrate safely below estimate (~85%)
  4. Repeat every 2–10 seconds

Question: What could go wrong?

(Think for 30 seconds before we continue.)

Three Ways It Breaks

1. Overestimation: Network drops 5 → 1 Mbps. Client requests 5 Mbps segment (20 Mbits). At 1 Mbps: download = 20 seconds. Buffer (10s) empties at second 10. Stall.

2. Oscillation: Picks 3 Mbps → fast download → measures “5 Mbps” → picks 5 Mbps → slow download → measures “3 Mbps” → repeat. Quality swings every segment.

3. Underutilization: Safety margin set to 60%. On 10 Mbps link, requests 6 Mbps. Wastes 40% of capacity.

Root cause: Past throughput ≠ future throughput. The prediction is often wrong.

TCP faces the same challenge (L3) — but adjusts every 50–100 ms. DASH adjusts every 2–10 seconds. 100× slower loop.

Act 2: BBA — Stop Predicting (2014)

Huang et al., deployed at Netflix

The Bathtub Insight

BBA’s idea: Ignore throughput entirely. Use buffer level as the only input.

Think of the buffer as a bathtub:

  • Water flows in (downloads), drains out (playback at constant rate)
  • Don’t measure the faucet — just look at the water level

The rate map:

  • Buffer < 10s → lowest bitrate (survival mode)
  • Buffer > 60s → highest bitrate (earned it)
  • In between → interpolate linearly

No prediction. No estimation error. Netflix result: 10–20% fewer stalls.

What Broke: The Cold Start

Question: User presses play. Buffer = 0. What does BBA do?

Requests 145 kbps — blocky and blurry. Even on a 20 Mbps network.

Must fill through linear ramp: 10–15 seconds of mediocre quality.

Throughput-based BBA
Startup Fast (1 segment) Slow (10–15s)
Stability Oscillates Stable

Can we get both? → That’s exactly what the next two generations tried.

Beyond BBA: Two More Generations

MPC (2015) and Pensieve (2017) — Summarized

MPC and Pensieve: Combining Signals

Throughput-based BBA (2014) MPC (2015) Pensieve (2017)
Uses Past throughput Buffer level Both + lookahead Learned policy
Strength Fast startup Stable, no prediction Plans ahead, formal QoE optimization Adapts to any network pattern
Weakness Oscillation Slow startup Prediction errors on volatile networks Doesn’t generalize across environments
Key idea Measure → match Observe → react Model → predict → optimize Train on traces → deploy

MPC formalizes the QoE hierarchy as an optimization: quality − μ × stalls − λ × switches. Evaluates all bitrate sequences over 5 segments. Picks the best. Re-plans every segment.

Pensieve replaces the hand-engineered optimizer with a neural network trained via reinforcement learning. 12–25% QoE gain over MPC in simulation.

But do they work in the real world?

The Sobering Reality: Puffer (2020)

Stanford. 63,508 real viewers. 38.6 years of video. Random algorithm assignment.

MPC and Pensieve did not consistently outperform BBA in the real world.

Lab champions ≠ field champions. The only consistent winner: Fugu — online learning updated continuously from real data, not pre-trained policies.

What happened next:

  • CausalSim (NSDI ’23 Best Paper): trace-driven simulation is biased — prior comparisons may have been wrong
  • SODA (SIGCOMM ’24, Amazon Prime Video): ABR with theoretical guarantees, deployed at scale
  • Netflix: optimized the encoding ladder instead — per-shot encoding + AV1 (30% of streams, 45% fewer stalls)

Lesson: Sometimes optimizing the input (what ABR chooses from) matters more than optimizing the controller (ABR itself).

The Generational Arc

Gen Algorithm Uses Breaks
2011 Throughput-based Past throughput Oscillation
2014 BBA Buffer level Slow startup
2015 MPC Both + lookahead Prediction errors
2017 Pensieve Learned policy Generalization
2020 Puffer/Fugu Online learning Still open

Same arc as medium access: ALOHA → CSMA → CSMA/CA → OFDMA.

More information → better control → new failure at the boundary.

Same structure as TCP (L3) and CSMA/CA (L6): observe → estimate → decide → act. ABR is the slowest loop (2–10s vs. TCP’s 50–100ms vs. CSMA/CA’s 9μs).

What Comes Next

L12 (Thursday): Take the buffer away entirely.

Netflix adapts every 10 seconds. Zoom must adapt every millisecond.

150 ms total budget. Everything from today — large buffers, 10-second segments, per-segment ABR — is unusable.

What remains when buffering is forbidden?