Last lecture asked: who goes first? We opened the router’s packet queue and found four scheduling disciplines — FIFO, priority queuing, round-robin, weighted fair queuing — each a different answer to the Coordination invariant. FIFO is simple but unfair. Priority queuing differentiates but starves. WFQ achieves max-min fairness but demands per-flow state. The progression was about ordering: given that packets are in the queue, which one transmits next?
Today we confront a more dangerous question: what happens when there is no room?
The queue is full. A new packet arrives. The simplest answer — tail drop — discards the newcomer. Problem solved, right? The buffer absorbed what it could, dropped what it could not, and life goes on.
Except it does not. The tail-drop answer creates a failure that propagates far beyond the router. It reaches back to the sender, warps the sender’s belief about the network, and breaks applications you care about — video calls, online games, anything interactive. The failure has a name: bufferbloat. And understanding it requires the Environment-Measurement-Belief (E-M-B) decomposition from Chapter 1 — the three-layer diagnostic we used for TCP over satellite (L2), distance-vector routing (L3), and the ride-sharing dispatch problem on the midterm. The pattern: the environment is in one state, the measurement signal is degraded or misleading, and the system’s belief diverges from reality. Bufferbloat is the cleanest instance of this pattern in the entire course.
Act 1: The arithmetic of delay
A concrete router
Consider a home WiFi access point — the kind sitting on your desk right now. It has a buffer. How large? Router manufacturers, for reasons we will unpack shortly, made these buffers enormous. Let us work through the numbers for a modest example [1][5][6]:
- Link capacity: 10 Mbps (a slow uplink — think cable modem upload)
- Packet size: 1,500 bytes (standard Ethernet MTU)
- Buffer depth: 100 packets
How much delay does this buffer introduce when full?
\[\frac{100 \text{ packets} \times 1{,}500 \text{ bytes/packet} \times 8 \text{ bits/byte}}{10{,}000{,}000 \text{ bits/second}} = \frac{1{,}200{,}000}{10{,}000{,}000} = 0.12 \text{ seconds} = 120 \text{ ms}\]120 milliseconds of pure queueing delay. Not propagation. Not processing. Just packets waiting in line.
This distinction matters: a speed test run on an idle network (no other traffic) would show low latency — perhaps 15 ms to a nearby server. That is idle latency — it measures propagation + processing + transmission, but no queueing. Now run that same speed test while your roommate is downloading a large file. The buffer fills. The latency jumps to 120+ ms. That is working latency — latency under load, where the queueing delay dominates. Bufferbloat lives entirely in the gap between idle and working latency. A user who runs a speed test, sees “15 ms latency, 10 Mbps,” and thinks “my connection is fine” will have a terrible Zoom call the moment someone else starts streaming [5].
Why 120 ms matters
Remember L12. The 150-millisecond wall. Human conversation collapses when one-way delay exceeds 150 ms [1]. Jaber walked you through the delay budget:
| Component | Typical delay |
|---|---|
| Encoding | ~20 ms |
| Propagation (US coast-to-coast) | ~30 ms |
| Jitter buffer | 50–100 ms |
| Decoding | ~5–10 ms |
| Total without queueing | ~105–160 ms |
That budget was already tight. Now add 120 ms of bufferbloat. The total becomes 225–280 ms. The VoIP call is destroyed. Not degraded — destroyed. You start talking over each other. You both pause. Awkward silence. You both start again. The conversation becomes a walkie-talkie.
And it is not just VoIP. An online game expecting 50 ms round-trip now sees 170 ms. A video conference expecting sub-100 ms sees 220 ms. Interactive web pages feel sluggish. Every latency-sensitive application on the network breaks, and the user has no idea why — their speed test says “10 Mbps,” which sounds fine.
The speed test is measuring the wrong thing. The problem is not bandwidth. The problem is delay. And the delay is hiding inside the buffer.
Act 2: Why TCP cannot see the queue
The feedback loop
TCP is a feedback control system. It sends packets, observes what happens, and adjusts. The specific feedback signal TCP uses is loss [1][6]. When a packet is lost — detected by a timeout or by duplicate ACKs — TCP interprets that as congestion: “the network is full, slow down.” TCP halves its congestion window (cwnd), reducing the sending rate. When no loss occurs, TCP interprets that as headroom: “the network can handle more, speed up.” TCP increases cwnd, growing the sending rate.
This feedback loop works beautifully when buffers are small. Send packets. Buffer absorbs a few. Buffer fills. Tail drop. TCP sees loss. TCP backs off. Buffer drains. TCP ramps up again. The sawtooth is fast, responsive, and stable.
Now make the buffer large — 100 packets, 120 ms worth. What happens?
The lie
TCP sends packets. The buffer absorbs them. And absorbs more. And more. No packets are lost. The buffer is enormous — it can hold 100 packets before anything drops. TCP observes zero loss. TCP’s interpretation: “no congestion — send faster.” So TCP increases cwnd. More packets flow into the buffer. Still no loss. TCP increases cwnd again. And again.
The queue is filling. Delay is growing — from 10 ms to 50 ms to 100 ms to 120 ms. But TCP does not measure delay. TCP measures loss. And there is no loss. So TCP’s belief about the network diverges from reality:
- Reality: The queue holds 100 packets. Delay is 120 ms. The link is saturated.
- TCP’s belief: No loss means no congestion. I can send faster. cwnd grows toward infinity.
The queue is lying to TCP. Not deliberately — it is doing exactly what it was designed to do: absorb packets. But the effect is a lie. The buffer creates the appearance of a healthy network while the reality is a congested one. TCP’s measurement signal — loss — arrives only when the buffer finally overflows. By then, the damage is done. Delay has been catastrophic for seconds or minutes.
Pause — let me ask you something
This is the State invariant decomposition from Chapter 1, and it is the cleanest example in the entire course. Let us be precise [6]:
Environment state: The true queue occupancy. 100 packets. 120 ms of delay at the bottleneck.
Measurement signal: Loss. TCP detects loss via timeouts or duplicate ACKs. But the large buffer degrades this signal catastrophically. Packets queue for 120 ms without dropping. Loss arrives only after the buffer overflows — late, sudden, and in a burst.
Internal belief: TCP’s congestion window, cwnd. This is the sender’s estimate of how fast it is safe to send. With no loss, cwnd grows without bound. The belief diverges from reality.
The gap between environment (queue full, delay exploding) and belief (can send faster) — that gap is bufferbloat. It lives in the measurement signal failure. Loss is a proxy for congestion, but when the buffer is large, it is a degraded proxy. The signal arrives too late and too suddenly. TCP oscillates between extremes: ramp up for many RTTs (buffer fills silently), then receive a burst of loss and back off aggressively (buffer drains suddenly). Latency is never stable — it is either high (buffer filling) or low (buffer draining). The sawtooth becomes a disaster.
Act 3: Why the buffers got so large
If large buffers cause bufferbloat, why did manufacturers install them? The answer is rational — and that makes the failure more interesting [5][6].
Burst absorption. When TCP opens a new connection, it sends an initial window of 10 packets as a burst. If the buffer is too small, the burst causes immediate loss. The connection backs off before it even gets started. Large buffers absorb these bursts, allowing TCP to ramp up efficiently.
Speed mismatch. A home router might receive data on a 1 Gbps Ethernet port and forward it to a 10 Mbps cable uplink. That is a 100:1 speed ratio. Without a buffer, every packet arriving on the fast port when the slow port is busy gets dropped immediately. A buffer bridges the gap.
Cheap memory. DRAM prices fell exponentially. Adding 128 MB of buffer memory to a router costs pennies. Router manufacturers sized buffers to maximize throughput metrics — link utilization — because throughput is easy to measure, easy to market, and easy to compare. “Our router achieves 99.9% link utilization!” sounds great on a spec sheet. Latency is harder to measure, harder to market, and was not part of the conversation.
The incentive was misaligned. Manufacturers optimized for throughput. Users cared about responsiveness. The metric the engineers measured was not the metric the users experienced. Jim Gettys, one of the original X Window System developers, diagnosed this failure in 2011 and gave it the name “bufferbloat” — dark buffers hiding latency throughout the Internet [5].
The core lesson: the buffers were not wrong. The absence of a signal was wrong. A buffer that absorbs packets is doing its job. A buffer that absorbs packets and tells no one is creating a measurement failure. What we need is a buffer that absorbs packets and signals congestion before it fills.
That idea is called Active Queue Management.
Act 4: AQM — redesigning the measurement signal
The insight behind AQM is simple and powerful: instead of waiting for the buffer to overflow and dropping everything (tail drop), drop packets early — before the buffer fills — so that TCP receives a congestion signal while there is still time to react [2][6].
Tail drop sends one signal: “the buffer overflowed.” AQM sends a richer signal: “the buffer is filling — slow down now, before it is too late.”
This is a redesign of the State invariant. The queue no longer passively stores packets. It actively monitors its own congestion and proactively generates signals for transport. The evolution of AQM is a series of refinements to one question: what should the queue measure to detect congestion?
Three generations answered that question differently. Each generation solved the previous one’s failure.
Act 5: RED — measuring queue length (1993)
The idea
Sally Floyd and Van Jacobson proposed Random Early Detection in 1993 [2]. The idea sounds elegant: compute an exponentially weighted moving average (EWMA) of the queue length. If the average is low, accept packets. If the average is high, drop packets probabilistically. The probability increases linearly between two thresholds.
The algorithm [2][6]:
if EWMA < min_threshold:
drop_prob = 0 # queue is fine — accept everything
else if EWMA > max_threshold:
drop_prob = 1 # queue is overloaded — drop everything
else:
drop_prob = max_p × (EWMA - min_threshold) / (max_threshold - min_threshold)
Two design choices need explanation:
Why average, not instantaneous? Because instantaneous queue length is noisy. A burst of 50 packets arrives, spikes the queue, and drains in milliseconds. RED does not want to react to that — the burst will clear on its own. Averaging smooths out transients, so RED only reacts to persistent congestion. That is the theory.
Why probabilistic, not deterministic? If RED dropped deterministically — say, every Nth packet when the threshold is crossed — multiple TCP flows would synchronize. They would all see loss at the same time, all back off at the same time, all ramp up at the same time, creating oscillation. Randomizing the drops desynchronizes the flows: each flow sees loss at a different time, backs off at a different time, and the aggregate traffic smooths out. This is called avoiding global synchronization [2].
Why RED failed
RED sounds reasonable. Decades of research followed. Papers were published. Parameters were tuned. And yet RED was rarely deployed in practice. Why?
The fundamental problem: queue length is the wrong measurement signal [3][6].
A long queue can mean two completely different things:
- A transient burst that will drain on its own. This is healthy — do not drop.
- Persistent overload requiring senders to slow down. This is unhealthy — do drop.
RED cannot distinguish them. Consider two scenarios on a 10 Mbps link:
- Scenario 1: A video application suddenly sends 100 packets as a burst. The queue spikes to 20 packets, then drains in 12 ms.
- Scenario 2: Two TCP flows each sending at 5 Mbps create a persistent queue of 40 packets.
RED’s EWMA (smoothed over ~100 ms) might reach 30 packets in both cases. RED makes the same dropping decision for both. But they require opposite responses: Scenario 1 needs no drops (the burst will clear), while Scenario 2 needs drops (to reduce the sending rate).
RED confuses a good queue (burst absorption, drains quickly) with a bad queue (persistent overload, never drains). It lacks the measurement to tell them apart.
And then there is the tuning problem. RED has seven parameters: min_threshold, max_threshold, max_drop_probability, EWMA weight, plus interactions with link capacity, RTT distribution, and traffic pattern. Optimal values differ for every deployment. Network operators found RED impossible to configure reliably. A setting that worked on a campus backbone failed on a residential access link. No single configuration worked everywhere [3][6].
The result: RED was a brilliant idea that solved the wrong measurement problem. It told the queue to look at itself (queue length) instead of looking at what actually mattered (how long packets were waiting).
Act 6: CoDel — measuring sojourn time (2012)
The breakthrough
Kathleen Nichols and Van Jacobson — yes, the same Jacobson who co-invented RED — returned to the problem nearly twenty years later with CoDel (Controlled Delay) [3]. The key insight was a single question: instead of measuring how many packets are in the queue, why not measure how long each packet waits?
That question changes everything [3][6].
Sojourn time is the time a packet spends in the queue — from the moment it arrives (enqueue) to the moment it departs (dequeue). Sojourn time naturally separates the two cases RED could not distinguish:
- Transient burst: Packets arrive in a burst but dequeue quickly. Sojourn times are short, even if the queue is momentarily large.
- Persistent overload: Packets accumulate and wait. Sojourn times are long, because the queue is not draining.
A burst creates a tall queue that drains fast (short sojourn). Persistent overload creates a tall queue that stays tall (long sojourn). Queue length cannot tell the difference. Sojourn time can.
The algorithm
CoDel tracks the minimum sojourn time over the last 100 ms [3]. Why the minimum? Because if any packet in the observation window got through quickly, the queue is still draining — the congestion is transient. Only when the minimum sojourn time exceeds the target does CoDel conclude that congestion is persistent.
On each packet dequeue:
sojourn = now - packet.enqueue_time
Track min_sojourn over the last 100 ms
if min_sojourn > target (5 ms):
# Persistent congestion — start dropping
drop_count += 1
next_drop_time = now + interval / sqrt(drop_count)
DROP packet at next_drop_time
else:
# Queue is healthy — reset
drop_count = 0
do not drop
The drop rate accelerates as congestion persists: the first drop is gentle, but if congestion continues, drops come faster (the $1/\sqrt{n}$ schedule). If congestion clears — min_sojourn drops below the target — CoDel resets immediately. No lingering aggression [3].
Why 5 ms?
The target delay of 5 ms is not arbitrary [3][6]. Think about the applications:
- VoIP sends a frame every 20 ms. A 5 ms target means each packet spends less than 25% of a frame period in the queue. The jitter buffer can handle this easily.
- A video call expecting sub-100 ms latency gets 5 ms of queue delay — well within budget.
- An interactive web request adds 5 ms — imperceptible to a human.
5 ms is conservative enough that interactive applications work, yet aggressive enough that the queue stays short.
The good queue vs. bad queue distinction
This is the conceptual breakthrough that makes CoDel fundamentally different from RED [3][6]. CoDel defines:
- Good queue: Packets are present (the buffer is absorbing a burst), but they drain quickly. Min sojourn is below 5 ms. CoDel does nothing — this queue is doing its job. Nichols and Jacobson’s original CoDel paper [3] illustrates this with a pipe diagram: a TCP connection starts, packets fill the buffer as slow-start ramps up, but after one RTT the sender receives ACKs and the arrival rate stabilizes at the link rate — the burst drains and the standing queue disappears.
- Bad queue: Packets are present and not draining. Min sojourn exceeds 5 ms persistently. CoDel drops packets — this queue is causing bufferbloat. In the pipe diagram, this corresponds to a sender whose window exceeds the BDP — the excess packets form a standing queue that never drains because the sender keeps refilling it.
RED could not make this distinction because queue length does not encode it. CoDel can because sojourn time is the distinction. A queue of 80 packets with a 2 ms min sojourn is fine — the packets are flowing through. A queue of 20 packets with a 50 ms min sojourn is a problem — the packets are stuck.
Zero tuning
CoDel has one control parameter: the target delay (5 ms). The observation interval (100 ms) is fixed. There is no EWMA weight, no min/max threshold pair, no max drop probability. Different networks can use different targets, but for most deployments, 5 ms works out of the box [3].
Compare this to RED’s seven parameters. CoDel is self-tuning because it measures the right quantity. When you measure the right thing, you do not need knobs to compensate for measuring the wrong thing.
The cost
CoDel requires per-packet timestamping. Every packet must be stamped at enqueue and checked at dequeue. At 10 Gbps with minimum-size packets, that is roughly 15 million packets per second, each needing a timestamp read and a comparison. In software (Linux kernel), this is feasible — CoDel is the default AQM in Linux. In hardware (high-speed ASICs), per-packet timestamping is expensive: it requires dedicated silicon and adds latency to the forwarding pipeline [3][4].
This cost motivated the next generation.
Act 7: PIE — estimating delay without timestamps (2013)
The engineering problem
CoDel is elegant and effective. But DOCSIS cable modem vendors had a problem: their silicon was already fabricated. Adding per-packet timestamping required a chip respin — millions of dollars and years of delay. They needed an algorithm that achieved CoDel’s goals without per-packet overhead [4].
The insight
Rong Pan and colleagues at Cisco proposed PIE (Proportional Integral Enhanced) in 2013 [4]. The key idea: you do not need to timestamp every packet to estimate queue delay. You can calculate it:
\[\text{delay\_estimate} = \frac{\text{queue\_length}}{\text{departure\_rate}}\]If the queue holds Q bytes and packets are leaving at rate R bytes per second, then the time to drain the queue is approximately Q/R seconds. This estimates sojourn time without touching any individual packet [4][6].
The algorithm
PIE runs on a periodic timer — every 10 to 100 ms [4]:
Every update interval (e.g., 10 ms):
departure_rate = bytes_departed / interval_duration
delay_estimate = queue_length / departure_rate
error = delay_estimate - target_delay
# PI controller update
drop_prob = drop_prob + alpha * error + beta * (error - prev_error)
drop_prob = clamp(drop_prob, 0, 1)
prev_error = error
On each packet arrival:
if random() < drop_prob:
DROP
else:
ENQUEUE
The PI controller — proportional-integral — is borrowed from control theory [4]. The proportional term (alpha * error) responds to the current gap between estimated delay and target. The integral term (beta * error change) responds to whether the gap is growing or shrinking. Together, they track the target delay stably, without oscillation.
Why PIE won the deployment race
PIE is cheaper to implement than CoDel [4][6]:
- No per-packet timestamps. The queue length and departure rate are statistics routers already track for billing and monitoring.
- The control loop runs periodically (every 10 ms), not per-packet. At 10 Gbps, that is one computation every 10 ms versus 15 million per second.
- Existing firmware on DOCSIS cable modem chipsets could implement PIE without a silicon respin.
The tradeoff: estimated delay is less accurate than measured sojourn time. If the departure rate is zero during an interval (no packets sent), the division is undefined. During bursty traffic with silent periods, PIE’s estimate can be noisy. But in practice, PIE works nearly as well as CoDel for most traffic patterns [4][6].
The deployment evidence is striking. PIE became the standard AQM in DOCSIS 3.1 cable modems — shipping in millions of devices worldwide. CoDel’s theoretical superiority could not overcome PIE’s practical deployment advantage. This is a recurring pattern in networking: the algorithm that is good enough and shippable beats the algorithm that is optimal and expensive.
Act 8: The design pattern — measurement signal evolution
Step back and look at the arc [6]:
| Generation | Year | What it measures | Signal type | Tuning | Deployed? |
|---|---|---|---|---|---|
| Tail drop | — | Nothing | Buffer overflow | None | Everywhere (default) |
| RED | 1993 | Queue length (EWMA) | Indirect (space) | 7+ parameters | Rarely |
| CoDel | 2012 | Sojourn time (min over window) | Direct (time) | 1 parameter | Linux default |
| PIE | 2013 | Estimated delay (Q/R) | Approximated (time) | 2 parameters | DOCSIS 3.1 (millions) |
Each step is a redesign of the State invariant: what information should the queue maintain internally to make good control decisions?
- RED’s answer: an EWMA of queue length. Indirect, ambiguous, requires manual tuning.
- CoDel’s answer: the minimum sojourn time. Direct, unambiguous, self-tuning.
- PIE’s answer: an estimated queue delay from rate measurements. Nearly as good as CoDel, vastly cheaper.
The progression also illustrates closed-loop reasoning. As the measurement signal improves, the feedback loop becomes more stable and responsive. RED’s loop oscillates because the signal is degraded — it cannot distinguish good queues from bad queues. CoDel’s loop stabilizes because sojourn time is a faithful signal of what actually matters. PIE’s loop approximates CoDel’s stability at a fraction of the cost.
Beyond single-queue AQM: per-flow isolation and explicit signals
RED, CoDel, and PIE all operate on a single shared queue. Every flow’s packets sit in the same buffer, managed by the same AQM controller. A single greedy flow — a bulk download opening ten parallel TCP connections — can monopolize the queue. The AQM reacts to aggregate congestion, not per-flow congestion. A VoIP call sharing the buffer with a torrent download gets the same treatment as the torrent.
This reveals a critical limitation: AQM solves the measurement signal problem (when to drop), but it does not solve the Coordination problem of fairness (who gets served). The solution combines L13’s fair queuing insight with L14’s AQM insight.
FQ-CoDel: per-flow queuing + per-flow AQM
FQ-CoDel (Fair Queuing + Controlled Delay) combines two ideas we have studied separately [3]:
Fair queuing (from L13): hash each packet’s 5-tuple (source IP, destination IP, source port, destination port, protocol) to assign it to one of ~1,024 queues. Each flow gets its own queue. A DRR scheduler (from L13) serves one packet from each non-empty queue in turn — same fair scheduling we studied, applied per-flow.
CoDel (from this lecture): run a separate CoDel instance on each per-flow queue. If one flow’s queue builds up (sojourn time exceeds 5 ms), CoDel drops packets from that flow only. Other flows are unaffected.
The combination is more than the sum of its parts. Under CoDel alone (single queue), a torrent download inflates the queue for everyone — a VoIP packet sits behind hundreds of torrent packets. Under FQ-CoDel, the torrent’s packets are in their own queue. The VoIP packets are in a separate queue. DRR ensures both get served promptly. And if the torrent’s queue builds up, CoDel drops torrent packets — not VoIP packets.
FQ-CoDel manages two sets of queues: new queues (flows that just started) and old queues (flows that have been active). New flows get priority — their first packets are served immediately, giving short flows (DNS lookups, web page loads) low latency. Old flows (long downloads) are served fairly but without priority. This means a web page load that starts during a large download gets its first packets through almost instantly [3].
FQ-CoDel is the default queue discipline in Linux since kernel 3.12 (2013). It requires no configuration — the hash-based flow identification and CoDel parameters work across a wide range of link speeds and traffic mixes.
CAKE: FQ-CoDel for real home networks
CAKE (Common Applications Kept Enhanced) extends FQ-CoDel for the specific challenges of home and ISP networks [3]:
- Per-host fairness, not just per-flow: FQ-CoDel gives each flow a fair share. But one host running 20 parallel download connections gets 20× the share of a host with one connection. CAKE uses 8-way set-associative hashing to achieve per-host fairness — each device on the network gets an equal share, regardless of how many flows it opens.
- DiffServ-aware scheduling: CAKE maps DiffServ code points (traffic class markings) to priority tiers — voice traffic (EF) gets strict priority over bulk downloads (BK). This is the priority queuing from L13, integrated with per-flow AQM.
- Ingress shaping: CAKE can be deployed on the inbound side of a home router, shaping incoming traffic to the actual uplink/downlink speed. This prevents bufferbloat in the ISP’s equipment by ensuring the home router is the bottleneck (where CAKE controls the queue), not the ISP’s CMTS.
CAKE is deployed in OpenWrt (the most widely used open-source router firmware) and is the recommended queue discipline for home networks.
L4S: explicit congestion signals and dual queues
All AQM schemes so far signal congestion by dropping packets — a blunt, destructive signal. The sender detects the drop (via timeout or duplicate ACKs) and reduces its rate. But dropping destroys data. The sender must retransmit, wasting bandwidth and adding delay.
ECN (Explicit Congestion Notification) offers a better interface: instead of dropping a packet, the router marks it — sets a bit in the IP header. The receiver echoes the mark to the sender, which reduces its rate just as if a drop had occurred. But no data is lost. No retransmission is needed. The signal is richer and less destructive [3].
The problem: classic TCP (Reno, Cubic) responds to ECN marks the same way it responds to drops — it needs roughly 5% marking to slow down meaningfully. But newer “scalable” congestion control algorithms (DCTCP, TCP Prague) respond to ECN marks at much lower rates — 0.1% marking is enough. If both share the same queue:
- Mark at 0.1% → scalable TCP responds, classic TCP ignores it (too few marks)
- Mark at 5% → classic TCP responds, scalable TCP over-reacts (too many marks)
L4S (Low Latency, Low Loss, Scalable Throughput) solves this with a dual-queue coupled AQM: classic traffic goes in one queue, scalable (L4S-marked) traffic goes in another. Each queue has its own AQM tuned to its transport’s response function. A coupling mechanism ensures that both queues get equal throughput shares despite their different marking rates — the classic queue’s drop probability drives the scalable queue’s marking probability [3].
L4S is the current frontier of AQM research and deployment, with active standardization in the IETF (RFC 9332) and experimental deployment in DOCSIS 3.1 cable networks.
Summary of key ideas
- Bufferbloat is the canonical E-M-B failure: large buffers degrade the measurement signal (loss), causing TCP’s belief (cwnd) to diverge from reality (queue full).
- 120 ms of queueing delay from a 100-packet buffer at 10 Mbps exceeds VoIP’s 150 ms budget. Idle latency (speed test) misses this; working latency (under load) reveals it.
- AQM = drop packets before the buffer fills, so transport gets early warning.
- RED (1993): measures queue length. Cannot distinguish burst from overload. Seven tuning parameters. Rarely deployed.
- CoDel (2012): measures sojourn time. Distinguishes good queues from bad queues. One parameter. Linux default.
- PIE (2013): estimates delay without per-packet timestamps. PI controller. Deployed in millions of DOCSIS cable modems.
- FQ-CoDel: combines per-flow fair queuing with per-flow CoDel. Isolates flows. Linux default since 2013.
- CAKE: extends FQ-CoDel with per-host fairness, DiffServ awareness, and ingress shaping. Standard for home networks.
- L4S: dual-queue coupled AQM with ECN marking instead of dropping. Separates classic and scalable transport. The current frontier.
- The arc from RED to CoDel to FQ-CoDel to L4S is a progression in both measurement signal quality (space → time → per-flow time) and interface richness (drop → mark → per-class mark).
References
[1] Kurose, J. F. and Ross, K. W. (2021). Computer Networking, 8th Edition. Pearson.
[2] Floyd, S. and Jacobson, V. (1993). “Random Early Detection Gateways for Congestion Avoidance.” IEEE/ACM Transactions on Networking, 1(4), 397-413.
[3] Nichols, K. and Jacobson, V. (2012). “Controlling Queue Delay.” ACM Queue, 10(5).
[4] Pan, R., Natarajan, P., Piglione, C., Prabhu, M. S., Cybenko, V., Baker, F., and Bump, B. (2013). “PIE: A Lightweight Control Scheme to Address the Bufferbloat Problem.” IEEE International Conference on High Performance Switching and Routing (HPSR).
[5] Gettys, J. and Nichols, K. (2012). “Bufferbloat: Dark Buffers in the Internet.” ACM Queue, 9(11).
[6] A. Gupta, A First-Principles Approach to Networked Systems, Ch. 5: Queue Management, UC Santa Barbara, 2026.