flowchart TD
A[Interface: IP best-effort]:::constraint --> B[Coordination: distributed, no admin]:::constraint
B --> C[State: loss-only signal]:::failure
C --> D[Bufferbloat + synchronization]:::failure
D --> E[Fix: ECN marks]:::fix
E --> F[New Gap: middlebox bleaching + fairness]:::failure
F --> G[Fix: L4S DualQ + AccECN]:::fix
G --> H[New Gap: partial deployment]:::failure
B --> I[ISP inserts token bucket]:::failure
I --> J[Transport belief broken]:::failure
J --> K[Fix: expose bucket OR CoDel at bucket]:::fix
A --> L[Socket API hides QoS]:::constraint
L --> M[App re-measures at layer 7]:::failure
M --> N[DASH, VMAF, Conviva]:::fix
B --> O[Relax coordination: single DC admin]:::fix
O --> P[DCTCP -> HPCC -> Swift]:::fix
classDef constraint fill:#cfe2ff,stroke:#084298,color:#000
classDef failure fill:#f8d7da,stroke:#842029,color:#000
classDef fix fill:#d1e7dd,stroke:#0f5132,color:#000
10 System Composition — Transport Meets Queue Management
10.2 Act 1: The Coupling Problem — Loss as the Only Signal (1988)
It’s 1988. Van Jacobson has just rescued the ARPANET from congestion collapse with AIMD. Every TCP flow infers congestion from the absence of an ACK. When a packet is dropped, the sender halves its window. When ACKs return, the sender grows its window linearly. The loop closes through the router’s tail-drop behavior: the queue fills, the queue overflows, the dropped packet becomes the signal.
“We use ‘packet loss’ as a congestion signal. The network tells us to slow down by dropping packets. There is no other signal available.” — Jacobson, 1988 (Jacobson 1988)
What the pioneers saw: A simple interface. Routers are dumb; they drop when full. Endpoints are smart; they infer. The signal is binary — either an ACK arrives or it doesn’t. The simplicity is the point: no cross-layer cooperation needed, no new header bits, no middlebox upgrades. TCP works over any IP network that can drop packets.
What remained invisible from the pioneers’ vantage point: Two pathologies that would emerge only at scale. First, global synchronization — when a tail-drop queue overflows, it drops from every flow simultaneously, and every flow halves its window at the same RTT (Floyd and Jacobson 1993). Utilization collapses to ~50%, then recovers together, then collapses again. The loops are not independent; they are phase-locked by the shared drop event. Second, bufferbloat (Gettys and Nichols 2012): if the buffer is large enough, the loss signal arrives too late. Transport sends at its believed rate for many RTTs while the queue silently grows to hundreds of milliseconds. By the time loss arrives, latency has already inflated.
10.2.1 The Solution: AIMD + Tail Drop
Transport applies closed-loop reasoning: sensor = missing ACK, estimator = duplicate-ACK counter, controller = halve cwnd (congestion window), actuator = reduced send rate. Queue management applies decision placement at the distributed extreme: every router decides independently, no coordination. The interface between them is deliberately minimal — one bit per packet, delivered by packet death.
10.2.2 Invariant Analysis: Loss-Based Coupling (1988)
| Invariant | Answer (1988) | Gap? |
|---|---|---|
| State | Loss events = congestion | Sparse at high speed |
| Time | Per-RTT feedback | Lags buffer growth |
| Coordination | Distributed, uncoordinated | Global synchronization |
| Interface | 1 bit (drop or deliver) | Destructive signal |
The State gap is the critical one. At 100 Mbps with 50 ms RTT, one drop per RTT produces a 0.01 % loss rate — transport sees 1 signal per ~10⁵ packets. At 100 Gbps, the signal is effectively silent until the queue overflows catastrophically. Transport’s belief about capacity drifts arbitrarily far from the environment’s actual state before a loss correction arrives.
10.2.3 Environment → Measurement → Belief
| Layer | What Transport Has | What’s Missing |
|---|---|---|
| Environment | Actual queue depth, link rate | Not observable to endpoint |
| Measurement | Missing ACKs, RTT samples | No queue-depth signal |
| Belief | cwnd = estimate of capacity | Lags reality by ~buffer/link |
The E→M gap is physically limited, not accidentally noisy: the IP interface exposes only two events (deliver, drop), and the endpoint sees only those. Fixing this requires changing the interface — adding a new event type.
10.2.4 “The Gaps Didn’t Matter… Yet”
In 1988, buffers were small (tens of packets), links were slow (1.5–45 Mbps), and synchronization produced tolerable 50 % utilization. The gap between belief and reality was bounded by small buffers. The next decade would change both.
10.3 Act 2: ECN — Adding the Third Event (2001)
It’s 2001. Backbone links are hitting gigabit speeds. RED (Floyd and Jacobson 1993) has been deployed in some routers to probabilistically drop early and break synchronization, but RED still uses loss as the signal. Sally Floyd, K. K. Ramakrishnan, and David Black propose a richer interface: give the router a way to say “slow down” without killing a packet.
“ECN allows the router to signal congestion by marking rather than dropping packets, preserving the data stream while still conveying the congestion signal.” — RFC 3168 (Ramakrishnan et al. 2001)
What the pioneers saw: The loss signal is destructive — every congestion signal costs a retransmission. At high speed this is wasteful. Two unused bits in the IP header (the TOS/DS byte’s low-order bits) can carry an explicit signal: ECT(0)/ECT(1) (ECN-Capable Transport) says “I understand ECN”; CE (Congestion Experienced) is the router’s mark; the receiver echoes it in the ACK, and the sender responds exactly as to a loss.
What remained invisible: Two deployment failures. First, middlebox bleaching — firewalls, NATs, and old routers would see unfamiliar bits and clear them. The signal disappeared mid-path, without warning. Second, asymmetric fairness — an ECN sender in a shared queue backs off at shallow queue depth (early marks), while a loss-based sender in the same queue continues until the queue overflows. The ECN flow gets less bandwidth because it’s more polite, which creates a disincentive to deploy ECN.
10.3.1 The Solution: ECT/CE in the IP Header
The IP interface applies disaggregation: the ToS byte’s low-order bits now separate “drop on congestion” from “mark on congestion.” The AQM interface remains RED-shaped (mark probability rises with queue depth), but the action is a bit-flip, not a packet kill. Transport’s response is unchanged: one CE mark = treat as one loss. This preserves backward compatibility — any sender that doesn’t know ECN still gets dropped.
10.3.2 Invariant Analysis: ECN (2001)
| Invariant | Answer (2001) | Gap? |
|---|---|---|
| State | Loss OR mark = congestion | Marks still treated as binary |
| Time | Per-RTT, more frequent | Same response magnitude |
| Coordination | Still distributed | Asymmetric fairness in shared queue |
| Interface | 2 bits (ECT, CE) | Middleboxes bleach |
The Interface gap is the killer. Honda et al. (2011) measured that ~20 % of Internet paths had middleboxes that cleared ECN bits (Honda et al. 2011). Deployment stalled: the chicken-and-egg problem was fatal. A sender who turned on ECN gained no benefit unless both the router and receiver supported it and no middlebox rewrote the bits.
10.3.3 Environment → Measurement → Belief
| Layer | What Transport Has | What’s Missing |
|---|---|---|
| Environment | Queue depth at bottleneck | Still not direct |
| Measurement | CE marks + missing ACKs | Mark count per RTT not exposed |
| Belief | cwnd, halved on any mark | No proportional response |
The E→M gap shrank from 1 bit to 2 bits, but the signal semantics remained binary (“halve on anything”). To extract real information, the sender needs to know how many marks arrived in the last RTT, not just whether one did.
10.3.4 “The Gaps Didn’t Matter… Yet”
In 2001, typical RTTs were 50+ ms, speeds 100 Mbps, and marking rates low. Binary response to marks was fine. The gap became catastrophic when datacenters arrived: 10 Gbps, 100 µs RTT, marks every few packets.
10.4 Act 3: L4S — Dual-Queue Coupled AQM (2013–2023)
It’s 2013. DCTCP (2010) has demonstrated that in a controlled datacenter, a scalable congestion controller responding proportionally to mark fractions can achieve sub-millisecond queuing at 10 Gbps (Alizadeh et al. 2010). But DCTCP is unfair to classic TCP in a shared queue — its shallow-queue operation starves Reno. Bob Briscoe and Koen De Schepper pose the challenge: can we deploy DCTCP-style transport on the public Internet without breaking classic TCP?
“The root cause is not ECN itself, but that classic TCP and scalable TCP cannot share one queue. Separate them, couple them, and both can coexist.” — De Schepper & Briscoe, RFC 9330 (De Schepper and Briscoe 2023a)
What the pioneers saw: The problem is not the ECN bit — it’s that one queue serves two incompatible control laws. Classic TCP expects infrequent, deep-queue marks (mirroring loss). Scalable TCP (DCTCP, Prague) expects frequent, shallow-queue marks. Sharing a single FIFO starves one or the other. The solution is two queues with a coupling law that enforces fair bandwidth sharing.
What remained invisible: Deployment would still hinge on the same middlebox ecosystem that killed RFC 3168. L4S repurposes ECT(1) as a classifier (“this flow is scalable-CC”), which requires middleboxes to leave ECT(1) alone — a hope, not a guarantee.
10.4.1 The Solution: DualQ + ECT(1) + AccECN
L4S applies disaggregation at the queue (Briscoe, De Schepper, Bagnulo, et al. 2023; De Schepper and Briscoe 2023b): one queue for scalable flows (L queue, shallow target, frequent marks), one for classic flows (C queue, deep target, drops or rare marks). A coupling law links their marking probabilities: p_C = (p_L / k)². This mathematical coupling makes classic TCP’s sqrt-loss response match scalable TCP’s linear mark response at the same bandwidth share.
Transport applies closed-loop reasoning with a new sensor: Accurate ECN (AccECN, RFC 9341 (Briscoe, De Schepper, Bondarenko, et al. 2023))1 lets the receiver feed back the count of CE marks in the last RTT, not just presence. The sender updates cwnd ← cwnd × (1 - α/2), where α is an EWMA of the mark fraction — exactly DCTCP’s rule, now deployable end-to-end.
Classifier: a flow sets ECT(1) to declare “I will respond proportionally.” The DualQ router reads ECT(1) to steer the packet into the L queue.
10.4.2 Invariant Analysis: L4S (2023)
| Invariant | Answer (2023) | Gap? |
|---|---|---|
| State | Mark fraction α per RTT | Requires AccECN deployment |
| Time | Sub-ms queue target in L queue | Classic queue still slow |
| Coordination | DualQ scheduler + coupling law | Cross-AS deployment fragile |
| Interface | ECT(1) classifier + multi-bit feedback | Middlebox bleaching risk |
The Coordination gap is the remaining one. L4S works when every bottleneck on the path is DualQ-capable. On a path that traverses three ASes and one of them runs classic FIFO, scalable flows lose their latency benefit on that hop.
10.4.3 Comparison: Before and After
| What Changed | Before L4S (2001 ECN) | After L4S (2023) |
|---|---|---|
| Queue structure | Single FIFO | Dual queue, coupled |
| Mark semantics | Binary per RTT | Fractional (counted) |
| Response to marks | Halve cwnd | Proportional reduction |
| Classic-TCP fairness | Broken (ECN starves) | Preserved by coupling law |
| Target queuing delay | Tens of ms | Sub-millisecond (L queue) |
10.4.4 Environment → Measurement → Belief
| Layer | What Transport Has | What’s Missing |
|---|---|---|
| Environment | Per-queue depth at DualQ | Depth on non-DualQ hops |
| Measurement | Mark fraction α, RTT, loss | Per-hop diagnostics |
| Belief | Capacity + queue proximity estimate | Assumes path is DualQ throughout |
The E→M gap is now structurally filtered rather than physically limited: the signal is honest and frequent, but it is filtered through the deployment state of the path. Measurement quality tracks administrative consolidation.
10.4.5 “The Gaps Didn’t Matter… Yet”
L4S works in datacenter-like conditions and in controlled access deployments (Nokia DualQ at cable modems, Apple scalable CC in iOS 16). The remaining gap — partial deployment — matters only for flows crossing non-L4S ASes. For intra-AS traffic, the binding constraint has shifted from “interface is too narrow” to “interface is not yet universal.”
10.5 Act 4: PowerBoost — Invisible Policy Breaks Transport (2008–2016)
It’s 2011. Srikanth Sundaresan and collaborators, instrumenting home gateways for the FCC broadband measurement study, observe anomalous transport behavior. Comcast cable customers see throughput of 25 Mbps for the first ~8 seconds of an upload, then an abrupt cliff to 12 Mbps — with no loss event, no marking, nothing transport can see (Sundaresan et al. 2011).
“PowerBoost allows ISPs to advertise peak rates they cannot sustain. The mechanism is a token bucket invisible to transport: packets flow unmolested during the burst, then the bucket drains and the shaper buffers indefinitely — or the policer drops without feedback.” — Sundaresan et al., 2011 (Sundaresan et al. 2011)
What the ISPs saw: A marketing tool. Advertise 25 Mbps peak, provision 12 Mbps sustained, statistical multiplexing absorbs the gap. For web browsing and small file transfers, users see the peak. For sustained uploads, they see the sustained rate. The token bucket2 is transparent — it’s just a shaper.
What remained invisible to the ISPs: TCP’s closed loop assumes the bottleneck is a FIFO queue with loss-based or mark-based signaling. A token bucket is a third control loop invisible to TCP. Bauer et al. (2011) and Flach et al. (2016) quantified the damage: policer-induced loss was 6× higher than shaped-traffic loss, and TCP retransmission overhead climbed sharply (Flach et al. 2016).
10.5.1 The Mechanism
A token bucket has depth PBS and refill rate MSTR. When the bucket is full, the sender transmits at its offered rate R. When R > MSTR, the bucket depletes in time D = PBS / (R − MSTR). After D seconds, one of two things happens:
- Shaper mode: excess packets queue inside the modem. Transport sees inflated RTT but no loss. The belief model mistakes the shaper for a congested bottleneck. Bufferbloat.
- Policer mode: excess packets are dropped without backpressure. Transport sees loss and halves cwnd. But the drops are deterministic (every packet above MSTR), and TCP’s stochastic backoff misses the true available rate. Utilization collapses.
10.5.3 Environment → Measurement → Belief
| Layer | What Transport Has | What’s Missing |
|---|---|---|
| Environment | Token bucket state + queue state | Token count never exposed |
| Measurement | RTT, loss, throughput samples | Shaper and congestion look identical |
| Belief | “Network is congested at 12 Mbps” | Wrong — network is rate-limited |
The E→M gap here is structurally filtered by policy: the signal is shaped by a party whose incentives diverge from both transport and AQM. This is the “tussle” (Clark et al. 2005) playing out in the measurement layer.
10.5.4 Why the Fix Is Hard
Three fixes exist, none widely deployed. Expose the rate limit via socket API: requires ISP cooperation. Deploy CoDel at the token bucket: keeps queue small, lets transport see delay signal honestly. Mark via ECN at bucket depletion: requires ECN everywhere. The political economy of ISP-as-adversary-to-TCP prevents all three.
10.5.5 “The Gaps Didn’t Matter… Yet”
Until sustained uploads became common (video conferencing, cloud backup, live streaming), PowerBoost was invisible to most users. By 2020, with everyone working from home, the cliff was hitting tens of millions of users daily.
10.6 Act 5: QoS vs QoE — The Application-Layer Mismatch (2011+)
It’s 2011. Conviva and Akamai are instrumenting millions of video streaming sessions. Florin Dobrian and collaborators publish the first large-scale study linking network-layer metrics to user behavior (Dobrian et al. 2011). They find that a 1 % increase in buffer-stall ratio reduces average viewing time by ~3 minutes. No transport-layer metric predicts this directly.
“Users do not experience loss rate. They experience stalls, startup delay, and visual quality. These are application-layer constructs built from, but not reducible to, network-layer signals.” — Dobrian et al., 2011 (Dobrian et al. 2011)
What the pioneers saw: Network operators measure what is easy — packet loss, latency, jitter, throughput. These are QoS (Quality of Service) metrics. User engagement depends on QoE (Quality of Experience): stall frequency, startup time, bitrate stability, VMAF (Video Multimethod Assessment Fusion)-scored visual quality. The mapping from QoS to QoE is non-linear, application-specific, and often non-monotonic.
What remained invisible: The mapping is also user-population-specific. A 720p→480p bitrate switch annoys a 4K-TV viewer but is invisible to a phone viewer. A 200 ms startup delay is fine for Netflix binge-watching but fatal for live sports. No single QoS target optimizes QoE across populations.
10.6.1 Concrete Mappings
| Application | Dominant QoE Metric | QoS Signal Used |
|---|---|---|
| VoIP | MOS (loss-sensitive) | Loss + one-way delay |
| DASH video | Stall ratio + VMAF + switch count | Throughput estimate |
| Video conferencing | End-to-end latency | RTT, jitter |
| Web | Time-to-first-byte | RTT, loss |
| Gaming | Jitter + tail latency | RTT variance |
The same 50 ms RTT is excellent for DASH (irrelevant, buffering absorbs it), tolerable for VoIP, and unacceptable for competitive gaming.
10.6.2 Invariant Analysis: QoE at the Application Layer
| Invariant | App-Layer Answer | Gap? |
|---|---|---|
| State | Buffer level, bitrate history | Indirect QoS access (throughput only) |
| Time | Per-chunk or per-frame decisions | Lags network state by chunks |
| Coordination | App-only (ABR algorithm) | No network cooperation |
| Interface | Socket API (throughput + RTT) | No native QoE exposure |
The Interface gap is structural: the socket API exposes bytes-per-second, not queue depth or loss semantics. The application must reinvent measurement at the application layer — DASH players estimate bandwidth from segment download time; VoIP codecs infer loss patterns from jitter buffer underruns. This is redundant with what transport already measures, but unavoidable on the current API.
10.6.3 Environment → Measurement → Belief
| Layer | What the Application Has | What’s Missing |
|---|---|---|
| Environment | Network capacity, path quality, user display | Only partially observable |
| Measurement | Chunk download times, stall events, playback buffer | No direct QoS signals |
| Belief | “Safe bitrate is X Mbps for the next N seconds” | Prediction-heavy, noisy |
The E→M gap is structurally filtered by the API: the application can only measure what the socket layer exposes. Bridging requires either new APIs (SCReAM, L4S-aware endpoints) or external measurement infrastructure (Conviva, M-Lab).
10.6.4 “The Gaps Didn’t Matter… Yet”
When video was 480p and populations were tolerant, simple throughput-based adaptation was enough. 4K streaming, live sports, and cloud gaming have shrunk the tolerance, and the gap between QoS signals and QoE targets now dominates user experience.
10.7 Act 6: Datacenter Co-Design — DCTCP, HPCC, Swift (2010–2020)
It’s 2010. Mohammad Alizadeh and collaborators at Microsoft Bing control every switch, every NIC, every kernel in their datacenter. There is no middlebox. There is no cross-AS deployment. There is no ISP policy shim. The Coordination invariant is relaxed: one administrator owns both sides of every interface. What can the composition look like when co-design is permitted?
“Because we control both the switch and the endpoint, we can mandate ECN, we can require a specific congestion control algorithm, and we can measure outcomes in microseconds.” — Alizadeh et al., 2010 (Alizadeh et al. 2010)
10.7.1 DCTCP (2010): Scalable CC via Mark Fraction
DCTCP applies closed-loop reasoning with a richer sensor: the receiver echoes the count of CE marks in the last RTT, and the sender computes α = (1 − g) × α + g × F where F is the current RTT’s marking fraction. Window update: cwnd ← cwnd × (1 − α/2). When marking is 1 %, the cut is 0.5 %, not 50 %. The loop runs at full speed with tiny perturbations — the scalable-CC template (Alizadeh et al. 2010).
10.7.2 TIMELY (2015): RTT as the Signal
Radhika Mittal and collaborators at Google argue that RTT, not ECN, is the most actionable signal: every endpoint already measures it, no switch changes needed (Mittal et al. 2015). TIMELY tracks RTT gradient (dRTT/dt) and reduces cwnd when the gradient turns positive. Pure endpoint control. No switch cooperation required — inverting DCTCP’s strategy.
10.7.3 HPCC (2019): In-Band Network Telemetry
Yuliang Li and collaborators at Alibaba push further: every switch stamps its queue depth and link utilization into the packet header (Li et al. 2019). The sender receives per-hop state at every ACK. The CC algorithm becomes simple: compute path utilization U from the max over hops, adjust W ← W_base/U + W_AI. Convergence in one RTT. The signal has grown from 1 bit (ECN) to ~42 bytes per switch (INT metadata).
10.7.4 Swift (2020): Delay Decomposition
Gautam Kumar and collaborators at Google take delay-based CC into production at fleet scale (Kumar et al. 2020). Swift decomposes RTT into fabric delay (switch queues) and endpoint delay (NIC queues at hosts), each with its own target. When NIC delay rises, the fabric is fine — the host is overloaded, and backpressure is applied at the endpoint, not the switch.
10.7.5 Invariant Analysis: Datacenter Co-Design
| Invariant | Datacenter Answer | Gap? |
|---|---|---|
| State | Mark fraction (DCTCP) → INT (HPCC) → delay decomposition (Swift) | Grows with signal richness |
| Time | Per-ACK or per-RTT, µs scale | Limited by clock accuracy (Swift) |
| Coordination | Single admin, co-designed | Not portable to public Internet |
| Interface | Mandated end-to-end (ECN, INT, timestamps) | Cannot deploy across AS |
10.7.6 Environment → Measurement → Belief
| Layer | What Transport Has | What’s Missing |
|---|---|---|
| Environment | Switch queues, NIC queues, link rates | Fully observable within DC |
| Measurement | ECN marks / INT / RTT decomposition | Nothing material |
| Belief | Precise path-utilization estimate | Matches environment closely |
The E→M gap is nearly closed within the datacenter. That is the return on administrative consolidation: relaxing the Coordination invariant (one admin, co-designed stack) enables the State invariant to be saturated (signal richness limited only by engineering choices, not middlebox policy).
10.8 The Grand Arc: From Loss to Co-Design
10.8.1 The Evolving Interface
| Era | Signal | Interface Width | Who Owns Semantics |
|---|---|---|---|
| 1988 | Loss | 1 bit (dropped / delivered) | Router (drop policy) + Endpoint (interpretation) |
| 2001 | Loss + ECN | 2 bits (ECT/CE) | Standards-body agreement |
| 2010 (DC) | ECN mark fraction | EWMA over RTT | Single admin |
| 2015 (DC) | RTT gradient | Continuous | Endpoint-only |
| 2019 (DC) | Per-hop INT | ~42 B / switch | Single admin |
| 2020 (DC) | Delay decomposition | Continuous + host state | Single admin |
| 2023 | L4S (DualQ + AccECN) | Counted marks + classifier | Opt-in across admins |
10.8.2 Three Design Principles Applied Across the Arc
Disaggregation: Every redesign separates previously merged concerns. L4S disaggregates one queue into two. Swift disaggregates RTT into fabric and endpoint components. HPCC disaggregates “congestion” into per-hop utilization fields. Each separation creates a new interface; each interface is an opportunity for tighter control and a risk of ossification.
Closed-loop reasoning: Every step enriches the sensor side of the loop. AIMD’s sensor is “did the ACK arrive?” DCTCP’s sensor is “what fraction was marked?” HPCC’s sensor is “what is each hop’s utilization?” As the sensor grows richer, the control law becomes more precise and the loop gain becomes proportional rather than binary. The diagnostic — does it converge, how fast, where does it oscillate — is answered better at every stage.
Decision placement: The public Internet is locked into distributed decision-making — the ASes have no incentive to coordinate. The datacenter moves all the way to co-designed decisions: one admin decides both the CC algorithm and the AQM policy. L4S is the attempt to keep distributed decisions while still enriching the shared signal — a compromise that works only where both sides opt in.
10.8.3 The Dependency Chain
10.8.4 Pioneer Diagnosis Table
| Year | Pioneer | Invariant | Diagnosis | Contribution |
|---|---|---|---|---|
| 1988 | Jacobson | Interface | Loss is the only signal | AIMD, packet conservation |
| 1993 | Floyd, Jackson | Coordination | Tail-drop synchronizes flows | RED (randomize) |
| 2001 | Ramakrishnan, Floyd, Black | Interface | Loss signal is destructive | RFC 3168 ECN (Ramakrishnan et al. 2001) |
| 2010 | Alizadeh et al. | State | Binary mark too coarse at DC speeds | DCTCP proportional response |
| 2011 | Sundaresan et al. | Coordination | Token buckets invisible to TCP | PowerBoost measurement |
| 2011 | Dobrian et al. | Interface | QoS fails to predict QoE | Stall-engagement model |
| 2012 | Gettys, Nichols | Time | Large buffers delay signal | Bufferbloat named (Gettys and Nichols 2012) + CoDel (Nichols and Jacobson 2012) |
| 2015 | Mittal et al. | State | ECN needs switch support; RTT doesn’t | TIMELY (delay-based DC CC) |
| 2016 | Flach et al. | Coordination | Policers cause 6× loss | Internet-wide policer study |
| 2019 | Li et al. | State | 1-bit ECN loses information | HPCC (per-hop INT) |
| 2020 | Kumar et al. | State | NIC queues can be bottleneck | Swift (fabric/endpoint split) |
| 2023 | De Schepper, Briscoe | Interface | Shared queue breaks fairness | L4S DualQ + AccECN |
10.8.5 Innovation Timeline
flowchart TD
subgraph sg1["Loss-Only Era"]
A1["1988 — Jacobson: AIMD"]
A2["1993 — RED"]
A1 --> A2
end
subgraph sg2["Explicit Signals"]
B1["2001 — RFC 3168: ECN"]
B2["2010 — DCTCP (DC)"]
B3["2012 — CoDel + Bufferbloat named"]
B1 --> B2 --> B3
end
subgraph sg3["Datacenter Co-Design"]
C1["2015 — TIMELY"]
C2["2019 — HPCC (INT)"]
C3["2020 — Swift"]
C1 --> C2 --> C3
end
subgraph sg4["Public Internet Revival"]
D1["2023 — L4S RFCs 9330/9331/9332"]
D2["2023 — AccECN RFC 9341"]
D1 --> D2
end
sg1 --> sg2 --> sg3 --> sg4
10.8.6 The Bidirectional Coupling Picture
Transport and AQM read and write one shared signal at the bottleneck. The signal is produced by one loop and consumed by the other. Across the six acts, the signal has grown from 1 bit to per-hop structured telemetry. The composition has moved from “independent loops that happen to share a queue” to “co-designed loops with a shared state representation.”
10.9 Generative Exercises
Suppose a future router can stamp a single 16-bit field into every packet, chosen by the operator. You may not change transport or AQM algorithms — only the signal semantics. What 16 bits maximize the information available to transport? Consider: queue depth (how many bits?), time-since-last-drain (how many bits?), link utilization (how many bits?), per-flow fairness hint (how many bits?). Justify your bit budget.
A flow crosses four hops: the first and last are DualQ (L4S-capable); the middle two are classic FIFO. Predict what the sender observes. Does ECT(1) survive? Are marks frequent or sparse? Does the scalable CC algorithm still work, or does it collapse to DCTCP-in-a-shared-queue fairness? Design a detection heuristic that lets the sender fall back to classic TCP behavior when it suspects a non-DualQ hop.
You have a cable modem connection. You run iperf and observe throughput of 50 Mbps for 12 seconds, then a drop to 20 Mbps, then (8 seconds later) a further drop to 10 Mbps. RTT climbs from 15 ms to 180 ms during the second drop. Construct the simplest token-bucket model that explains this trace. What are PBS, MSTR, and R for each stage? If you could add one ECN-marking point at the ISP edge, where would you place it to let TCP adapt smoothly?
10.10 References
Accurate ECN (AccECN, RFC 9341) extends the original ECN mechanism by feeding back the exact number of CE-marked packets (rather than a single binary ECN-Echo bit). This gives the sender a proportional congestion signal — enabling fine-grained rate adjustments instead of the blunt halving that standard ECN triggers.↩︎
A token bucket accumulates tokens at a constant rate (the committed information rate). Each packet consumes tokens proportional to its size. If tokens are available, the packet passes; if not, it is queued or dropped. The bucket depth (burst size) controls how much traffic can exceed the committed rate in a burst.↩︎