Who Gets Served Next? Scheduling Disciplines

For three lectures, applications did all the work. Netflix built a 60-second buffer and a sophisticated ABR loop to absorb a network that offered no guarantees. Zoom reduced that buffer to 50 milliseconds, added FEC, adaptive codecs, and a global relay network — all to compensate for a network that treats every packet identically. Cloud gaming pushed the time constraint below 50 milliseconds and still the network offered nothing in return. Every technique we studied — buffering, bitrate adaptation, jitter management, loss concealment — was the application compensating for the network’s indifference.

Today we go inside the router.

A router receives packets on input links and forwards them on output links. When packets arrive faster than the output link can transmit — and they always do, in bursts — the excess accumulates in a buffer. That buffer is finite. Someone must decide: which packet gets transmitted next? That is the scheduling question, and it is the subject of this lecture.

This is not a subplot of transport. Queue management is a peer system to TCP — it creates the environment transport operates in, and transport’s signals reshape queue behavior in return [4]. The two co-evolve, but neither is subordinate to the other. We will spend three lectures inside this system: today on scheduling (who gets served next?), L14 on active queue management (when should packets be dropped?), and L15 on the coupling between queues and transport (how the two systems form a single feedback loop).

The anchor for everything that follows: a finite buffer at a shared link [4]. This is architectural, not temporary. Routers have fixed memory. Traffic is bursty. The buffer fills. Something must give. How you manage what gives — that is queue management.

Act 1: The default — FIFO

A queue at a bottleneck

Picture a router connecting a campus network (1 Gbps) to an ISP uplink (100 Mbps). Traffic arrives 10 times faster than it can leave. Packets accumulate. The router stores them in a buffer and transmits them in the order they arrived. This is FIFO — First In, First Out [1][4].

FIFO requires almost no state. A pointer to the head of the queue (next packet to transmit) and a pointer to the tail (where new arrivals go). No per-flow tracking. No classification logic. No weights, no priorities. A single queue, served in arrival order.

FIFO is work-conserving: the link never sits idle when packets are available. Every bit of capacity is used. From a utilization standpoint, FIFO is optimal — it wastes nothing.

And FIFO is the default everywhere. The vast majority of router interfaces on the Internet use FIFO scheduling today [1][4]. It is cheap, simple, and fast. At 100 Gbps line rates, where a router may process hundreds of millions of packets per second, “cheap and fast” matters enormously.

The VoIP packet and the Netflix chunk

So what is wrong with FIFO? Consider this scenario.

A Netflix client downloads a 4-megabyte video segment. That segment is broken into roughly 2,700 packets (each about 1,500 bytes). Those packets arrive at the bottleneck router in a burst. Right behind them — literally one packet later — a VoIP packet arrives. It is 200 bytes. It carries 20 milliseconds of someone’s voice. It has a hard deadline: if it does not reach the receiver within 150 milliseconds end-to-end, the conversation degrades [4].

Under FIFO, the VoIP packet waits behind all 2,700 Netflix packets. At 100 Mbps, each 1,500-byte packet takes 0.12 ms to transmit. The VoIP packet waits:

2,700 packets x 0.12 ms = 324 milliseconds

324 milliseconds of queuing delay alone. The VoIP budget for the entire end-to-end path is 150 ms. This single queue has already blown that budget by more than double. The call is unusable — not because the network lacks capacity, but because the scheduler is indifferent to urgency.

FIFO’s fairness problem

Now consider fairness. Two flows share the bottleneck link: Flow A sends 1,000 packets per second. Flow B sends 10 packets per second. Under FIFO, both flows wait in the same queue. Flow A’s packets dominate the queue. Flow B’s packets — arriving infrequently — land behind Flow A’s long train. Flow B’s latency is determined by Flow A’s sending rate, not its own.

Is this fair? In one narrow sense, yes: every packet waits its turn, no packet gets special treatment. FIFO is fair on a per-packet basis [4].

But per-packet fairness is not per-flow fairness. Flow A consumes 99% of the link. Flow B, despite being a well-behaved lightweight flow, suffers the same queuing delay as if it were part of the congestion. FIFO punishes the innocent along with the aggressive.

Pause and reflect

FIFO’s coordination answer is: order is destiny [4]. The scheduler decides nothing about fairness, priority, or application requirements. The result is deterministic but blind. It is the scheduling equivalent of ALOHA from L5 — the simplest possible protocol, with no intelligence about who is transmitting or why.

And like ALOHA, FIFO’s simplicity creates a failure mode that motivates the next generation.

Act 2: Priority queuing — let important traffic cut the line

The idea

The VoIP-behind-Netflix problem has an obvious fix: let the VoIP packet go first. Instead of one queue, maintain multiple queues at different priority levels. The scheduler always serves the highest-priority non-empty queue first. Within each priority level, packets are served FIFO [1][4].

A typical configuration:

Priority	Queue	Traffic type
High	Queue 1	VoIP, emergency services, network control
Medium	Queue 2	Interactive web, video conferencing
Low	Queue 3	Bulk transfer, email, software updates

When a packet arrives, the router classifies it — typically using the DiffServ field in the IP header (a 6-bit field originally designed for exactly this purpose) — and places it in the appropriate queue [1]. The scheduler checks Queue 1 first. If it has packets, transmit one. If not, check Queue 2. If not, Queue 3.

Now the VoIP packet does not wait behind 2,700 Netflix packets. It enters the high-priority queue, gets served immediately (or after at most one packet currently being transmitted), and reaches the receiver with minimal queuing delay.

The coordination answer changes

Notice what changed in the four-invariant reading. Under FIFO, the Coordination answer was “no one decides” — arrival order determines everything. Under priority queuing, the answer is: the network operator decides, at configuration time, based on application type [4].

This is explicit coordination. Someone — a network engineer — configures the priority map. That map reflects a policy: “voice matters more than video, video matters more than bulk transfer.” The policy is static, configured in advance, and applied uniformly.

The starvation problem

Priority queuing works beautifully when high-priority traffic is light. VoIP calls use 64 kbps each. Even a hundred concurrent calls consume 6.4 Mbps — a small fraction of a 100 Mbps link. The high-priority queue is rarely congested, and low-priority traffic gets served in the gaps.

But what if high-priority traffic is heavy? Suppose a misconfigured server floods the network with packets marked as high priority. Or suppose an attacker deliberately marks bulk traffic as high priority. Or suppose legitimate high-priority traffic simply exceeds expectations — a crisis drives a surge of VoIP calls.

In any of these cases, the high-priority queue is always full. The scheduler serves it first — always. Queue 2 and Queue 3 never get served. Low-priority packets wait indefinitely. They starve [1][4].

Starvation is not a bug in the implementation; it is a feature of the design. Strict priority means “always serve the highest priority.” Always means always — even if it means other traffic never gets through.

Can we fix starvation?

One approach: priority with rate limiting. Guarantee that high-priority traffic cannot exceed a certain rate (say, 20% of link capacity). If it tries, excess high-priority packets are dropped or demoted. This prevents starvation but introduces a new problem: you must correctly estimate the right rate limit for each priority class. Too low, and high-priority traffic gets dropped during legitimate peaks. Too high, and you have not solved starvation.

Another approach: abandon strict priority entirely and give everyone a turn. That is the next step.

Pause and reflect

Priority queuing trades one problem for another. FIFO was indifferent — it could not distinguish VoIP from Netflix. Priority queuing is discriminating but rigid — it can starve entire traffic classes. The coordination evolved from “no one decides” to “the operator decides statically,” but static decisions break under dynamic traffic.

We need a scheduler that is fair — one that guarantees every flow makes progress, regardless of what other flows are doing.

Act 3: Round-robin — everybody gets a turn

The idea

Round-robin (RR) maintains a separate queue for each flow (or each class). The scheduler visits each non-empty queue in turn, transmitting one packet from each before moving to the next [1][4].

Queue A:  [pkt] [pkt] [pkt] [pkt]
Queue B:  [pkt] [pkt]
Queue C:  [pkt]

Transmission order: A, B, C, A, B, A, A

Every flow gets its turn. No flow starves. If one flow is sending aggressively and another is sending lightly, the light flow still gets served on every round. This is a fundamental improvement over both FIFO (no fairness) and priority queuing (unfair to low-priority flows).

The coordination answer is: take turns [4]. Simple, intuitive, and fair in a specific sense — every flow gets an equal number of packets transmitted per round.

The packet-size problem

But “equal packets” is not “equal bytes.” Consider two flows:

Flow A sends 1,500-byte packets (standard Ethernet MTU).
Flow B sends 500-byte packets (small interactive messages).

Round-robin gives each flow one packet per round. But Flow A transmits 1,500 bytes per turn, while Flow B transmits 500 bytes per turn. Flow A gets three times the bandwidth [4].

The scheduler is fair in packet count but unfair in byte count. A flow that uses large packets gets a disproportionate share of the link. This is not a contrived edge case — different applications routinely use different packet sizes. Bulk transfers use maximum-sized packets (1,500 bytes). VoIP uses small packets (200 bytes). DNS queries use small packets. ACKs use tiny packets (40-60 bytes).

Per-packet fairness is not the fairness we want. We want per-byte (or per-bit) fairness: each flow should get an equal share of the link’s bandwidth, regardless of how it packetizes its data.

Pause and reflect

Round-robin solved starvation but introduced a new unfairness. The problem is that the unit of service (one packet) does not correspond to the unit of resource (one byte of bandwidth). We need a scheduler that accounts for packet sizes.

This is the same pattern we have seen throughout the course. ALOHA was simple but unfair (large frames monopolized the channel). Slotted ALOHA improved efficiency but not fairness. CSMA/CA added carrier sensing but still had capture effects. Each generation fixed one problem and revealed the next. Scheduling follows the same progression.

Act 4: Weighted Fair Queuing — proportional shares

The ideal: bit-by-bit fairness

Imagine an impossible scheduler: it transmits one bit from each flow in turn. Flow A transmits one bit, then Flow B transmits one bit, then Flow C, and so on. Every flow gets exactly the same number of bits per unit time, regardless of packet sizes. This is perfect bitwise fairness [2].

Of course, this is physically impossible — you cannot transmit a fraction of a packet. Networks are packet-switched; the atomic unit of transmission is a packet. But the bit-by-bit scheduler gives us a reference point: if we could compute which packet the ideal bit-by-bit scheduler would finish transmitting first, we could serve that packet first in the real system.

This is exactly the insight of Weighted Fair Queuing (WFQ), proposed by Demers, Keshav, and Shenker in 1989 [2].

How WFQ works

WFQ maintains a separate queue for each flow. Each flow is assigned a weight $w_i$ reflecting its fair share of the link. The scheduler uses a concept called virtual time — a clock that advances not by wall-clock seconds but by the number of bits the link has transmitted, normalized by the weights of active flows [2][4].

For each arriving packet, WFQ computes a virtual finish time: the time at which this packet would finish transmission in the ideal bit-by-bit scheduler. The scheduler then serves queues in order of their packets’ virtual finish times — the packet that would finish first in the ideal system gets transmitted first in the real one.

The effect: over any sufficiently long interval, each flow $i$ receives a fraction of the link bandwidth proportional to $w_i / \sum w_j$ (where the sum is over all active flows). A flow with weight 2 gets twice the bandwidth of a flow with weight 1 [2].

Max-min fairness

WFQ implements max-min fairness: it maximizes the minimum allocation to any flow, then recursively maximizes the next-smallest [2][4]. The result has a strong game-theoretic property — it is Pareto optimal: no flow can increase its allocation without decreasing another’s. No one can game the system by sending faster; the scheduler guarantees each flow its weighted share regardless of what other flows do.

This is a fundamentally different coordination answer from anything we have seen. FIFO: “no one decides.” Priority: “the operator decides statically.” Round-robin: “take turns by packet.” WFQ: “each flow gets a fair share proportional to its weight, enforced by the scheduler dynamically” [2][4].

The concrete difference

Let us revisit the VoIP-and-Netflix scenario. Suppose the bottleneck link is 100 Mbps. Three flows share it:

Flow V (VoIP): 64 kbps, weight 1
Flow N (Netflix): 5 Mbps, weight 1
Flow B (bulk transfer): greedy, weight 1

Under FIFO: Flow B floods the queue. VoIP waits behind Netflix and bulk packets. Latency is unbounded.

Under WFQ with equal weights: each flow gets 1/3 of the link (~33 Mbps each). But Flow V only needs 64 kbps and Flow N only needs 5 Mbps. Their unused share is redistributed to Flow B. In practice, Flow V gets its 64 kbps with minimal queuing delay (its queue is almost always empty because it sends so little relative to its share), Flow N gets its 5 Mbps comfortably, and Flow B gets the remaining ~95 Mbps [2].

The VoIP packet no longer waits behind 2,700 Netflix packets. It has its own queue. Its virtual finish time is computed independently. It gets served promptly because its queue is short. WFQ provides isolation: one flow’s behavior cannot degrade another flow’s performance [2].

The cost: per-flow state

Here is the tradeoff. WFQ requires per-flow state [2][4]. For each flow, the router must maintain:

A separate queue
A weight
A virtual finish time for the head-of-line packet
The virtual time clock

The virtual time computation is O(log F) per packet, where F is the number of active flows (the scheduler must find the packet with the smallest virtual finish time, typically via a sorted data structure) [2].

At a campus router with hundreds of flows, this is manageable. At a core Internet router carrying millions of simultaneous flows at 100 Gbps? Millions of queues. Millions of virtual finish times to maintain and sort. Hundreds of millions of packets per second, each requiring a log-million comparison.

This is where the beautiful theory meets engineering reality.

Act 5: Deficit Round-Robin — a practical compromise

The O(1) insight

In 1996, Shreedhar and Varghese proposed Deficit Round-Robin (DRR) — a scheduling algorithm that achieves approximately the same fairness as WFQ but with O(1) per-packet processing cost [3].

The idea is elegant. Each flow gets a queue and a deficit counter. Each round, the scheduler adds a fixed quantum $Q$ (in bytes) to each flow’s deficit counter. The scheduler then visits each flow in round-robin order. For each flow, it transmits packets as long as the head-of-line packet’s size is less than or equal to the deficit counter. Each transmitted packet decreases the counter by its size. If the head-of-line packet is too large, the flow keeps its remaining deficit for the next round [3].

A concrete example with quantum $Q = 1000$ bytes:

Round 1:

Flow A (1,500-byte packets): deficit = 0 + 1,000 = 1,000. Head-of-line packet is 1,500 bytes. 1,000 < 1,500 — cannot transmit. Deficit stays at 1,000.
Flow B (500-byte packets): deficit = 0 + 1,000 = 1,000. Transmit one packet (500 bytes). Deficit = 500. Transmit another (500 bytes). Deficit = 0.

Round 2:

Flow A: deficit = 1,000 + 1,000 = 2,000. Transmit one packet (1,500 bytes). Deficit = 500.
Flow B: deficit = 0 + 1,000 = 1,000. Transmit two packets. Deficit = 0.

Over two rounds: Flow A transmitted 1,500 bytes. Flow B transmitted 2,000 bytes. Not perfectly equal, but the deficit accumulates and self-corrects over time. As the number of rounds grows, each flow’s bandwidth converges to its fair share [3].

Why O(1) matters

DRR never sorts. It never computes virtual finish times. It visits each active flow once per round, adds a constant, compares, and transmits. The per-packet cost is constant — independent of the number of flows [3].

At 100 Gbps with 1,500-byte packets, the router has roughly 8 nanoseconds per packet. There is no time for log-F comparisons. DRR’s O(1) cost is the reason it is deployable where WFQ is not.

DRR sacrifices the exact packet-by-packet ordering of WFQ — it can be off by one packet’s worth of transmission per flow per round. But over any reasonable interval, the fairness properties converge. In practice, the approximation is close enough that DRR (and its variants) are the most widely deployed fair scheduling algorithms in real networks [3].

Act 6: The scalability-fairness tradeoff

A hierarchy of state

Step back and look at the progression:

Discipline	Per-flow state	Per-packet cost	Fairness guarantee
FIFO	None	O(1)	None — arrival order only
Priority	Per-class (few queues)	O(1)	Priority class gets preference; starvation risk
Round-robin	Per-flow queue	O(1)	Per-packet fair; byte-unfair
WFQ	Per-flow queue + virtual time	O(log F)	Max-min fair (per-byte)
DRR	Per-flow queue + deficit counter	O(1)	Approximately max-min fair

Each step up the ladder adds state. More state means more fairness. More state also means more memory, more computation, and more complexity at line rate [4].

This creates a tradeoff that is not merely practical but architectural. The amount of per-flow state a scheduler can maintain depends on where it sits in the network.

Internet core vs. cellular edge

Consider two deployment environments:

Internet core router (e.g., a Juniper or Cisco backbone router):

Line rate: 100 Gbps to 400 Gbps
Active flows: millions (every TCP connection, every UDP stream, every DNS query transiting the router)
Per-packet budget: single-digit nanoseconds
Constraint: cannot maintain per-flow state for millions of flows at nanosecond timescales

Result: FIFO [4]. The Internet core uses FIFO almost exclusively. Not because FIFO is fair — we just spent an hour showing it is not — but because it is the only discipline that scales to the core’s demands. The core’s scheduling answer is forced by its deployment constraints.

Cellular base station (e.g., a 4G LTE or 5G eNodeB):

Line rate: hundreds of Mbps to a few Gbps (shared across all users in the cell)
Active flows: tens to low hundreds (each connected phone)
Per-packet budget: microseconds
Constraint: the base station already knows every user (they are authenticated and registered)

Result: WFQ (or proportional-fair scheduling) [4]. Cellular base stations use per-user scheduling as the default. They already maintain per-user state for authentication, handoff, and power control. Adding a per-user scheduling weight is trivial. The base station runs a centralized scheduler that allocates every time slot to a specific user — exactly the WFQ model.

The connection to medium access

This should sound familiar. In L5-L8, we traced medium access from ALOHA (no coordination, no state) through CSMA/CA (distributed, minimal state) to OFDMA (centralized, per-user state). The progression was driven by the same tradeoff: more state enables more efficient coordination, but more state is more expensive.

Scheduling disciplines follow the same arc:

Medium access	Scheduling
ALOHA — no coordination	FIFO — no fairness decisions
CSMA/CA — distributed, minimal state	Round-robin — per-flow turns, no byte accounting
OFDMA — centralized, per-user state	WFQ — per-flow weights, proportional allocation

The parallel is not a coincidence. Both medium access and scheduling are instances of the Coordination invariant: how do multiple competing entities share a finite resource? In medium access, the resource is the wireless channel. In scheduling, the resource is the output link. The tradeoff between simplicity and fairness is identical [4].

Act 7: Stochastic Fairness Queuing — hashing as compromise

The middle ground

What if you want some fairness at core-like speeds but cannot afford per-flow state for millions of flows? Stochastic Fairness Queuing (SFQ) offers a compromise [4].

SFQ maintains a fixed number of queues — say, 1,024 — and uses a hash function to map each flow to a queue. The scheduler then applies round-robin (or DRR) across the 1,024 queues. Flows that hash to the same queue share that queue’s allocation; flows in different queues are isolated from each other.

With 1,024 queues and 10,000 active flows, on average ~10 flows share each queue. Those 10 flows experience FIFO-like competition within their shared queue, but the 1,024 queues get fair treatment relative to each other. The result is approximate fairness — not as good as WFQ (some flows collide), but far better than pure FIFO (most flows are isolated from most other flows).

SFQ periodically changes its hash function to prevent the same flows from being permanently grouped together. This randomization ensures that collisions are temporary and that over time, every flow gets approximately its fair share.

The tradeoff: SFQ maintains O(1) state per queue (1,024 deficit counters), not per flow (potentially millions). It is deployable at high line rates. But it provides probabilistic, not deterministic, fairness guarantees.

The grand arc: coordination answers constrain state

Today we traced a single question through five scheduling disciplines: who gets served next? The answer evolved:

FIFO: “Whoever arrived first.” No coordination. No fairness. But zero per-flow state — deployable anywhere.
Priority: “Whoever the operator designated as important.” Static coordination. Starvation risk. Minimal state.
Round-robin: “Everyone takes turns.” Fair in packet count, unfair in bytes. Per-flow queues but no byte accounting.
WFQ: “Everyone gets their proportional share, accounting for packet sizes.” Max-min fair. Per-flow virtual time. O(log F) per packet.
DRR: “Approximately everyone’s proportional share.” Nearly as fair as WFQ. O(1) per packet.

The progression reveals a fundamental tradeoff: fairness requires state, and state has a cost that depends on deployment context [4]. This is why the Internet core uses FIFO and cellular base stations use WFQ — not because one is better in the abstract, but because the deployment constraints (line rate, flow count, existing per-user state) determine what is feasible.

This is the Coordination invariant applied to the queue. The same invariant we saw in medium access (L5-L8), in transport (L3-L4), and in multimedia applications (L10-L12). Every time multiple entities compete for a shared resource, the system designer must choose: how much state am I willing to maintain, and how fair do I need to be?

Bridge to L14: the other half of the question

Today we answered “who gets served next?” — the scheduling question. But we left a question untouched: what happens when the buffer is full?

Under all the scheduling disciplines we discussed, when a packet arrives and the buffer has no room, the router drops the packet. It drops the last arrival (tail-drop). This seems reasonable — the queue is full, the new packet has no place to go.

But tail-drop has a devastating interaction with TCP. When the buffer fills, all flows experience loss simultaneously. Every TCP sender backs off at the same time. The buffer drains. Every sender ramps back up at the same time. The buffer fills again. All senders lose packets again, simultaneously. The result is global synchronization — a destructive oscillation where the link alternates between fully utilized (buffer full, every sender at max rate) and underutilized (buffer empty, every sender just backed off) [4].

Could the router be smarter? Could it drop packets before the buffer is full, to warn senders early? Could it choose which packets to drop, rather than always dropping the newest arrival? Could the drop decision itself become a signal — a way for the router to communicate with senders without any explicit protocol?

That is Active Queue Management — the subject of L14. If scheduling asks “who goes next?”, AQM asks “who gets dropped, and when?” Together, they define the router’s complete queue management policy.

References

[1] Kurose, J. F. and Ross, K. W. (2021). Computer Networking, 8th Edition. Pearson.

[2] Demers, A., Keshav, S., and Shenker, S. (1989). “Analysis and Simulation of a Fair Queuing Algorithm.” Proc. ACM SIGCOMM.

[3] Shreedhar, M. and Varghese, G. (1996). “Efficient Fair Queuing Using Deficit Round-Robin.” IEEE/ACM Transactions on Networking.

[4] A. Gupta, A First-Principles Approach to Networked Systems, Ch. 5: Queue Management, UC Santa Barbara, 2026.