13  Measurement, Management, and the Research Frontier


13.1 The Anchor: Measurement Disturbs, Observation Is Incomplete

Every networked system must observe itself to adapt. Transport infers congestion from ACKs. Routing learns topology from flooded advertisements. Queue management reacts to depth. These operational systems embed measurement inside themselves. But operators also need external visibility — the ability to observe what a system does, why it fails, whether it matches intent. That external measurement is its own engineering problem, with its own binding constraint.

The binding constraint is the Heisenberg problem of networking. Active measurement injects probes to get precise answers, but the probes consume bandwidth, are often treated differently from real traffic, and at scale become a denial-of-service themselves. Passive measurement observes existing traffic with zero injection overhead, but the observation is always incomplete — you see only what happened to flow through the vantage point, and only what the vantage point can record. Observation always perturbs, and perturbation always costs.

“Measuring the Internet is hard. The measurement instruments interact with the phenomenon, and what we see is always a sample rather than the truth.” — Vern Paxson, 1997 (Paxson 1999)

A second constraint sits on top of Heisenberg: the impossibility triangle of flexibility, scalability, and accuracy. At 100 Gbps, data generation is ~200 million packets/sec and ~100 GB/sec of raw headers. Every system surrenders at least one of: keeping everything, processing everything, or answering arbitrary questions. Every practical measurement architecture surrenders one corner.

Decision problems every measurement system must answer:

  1. Active probes or passive observation? — synthesize state with known properties, or observe real state with unknown confounders?
  2. What to sample when you can’t capture everything? — which packets, which counters, which intervals?
  3. Where to execute the query — at capture time or at query time? — push filtering into the data plane (cheap, inflexible) or store raw data and query later (flexible, expensive)?
  4. How to verify network behavior matches intent? — observe runtime outcomes (too late) or reason about configurations symbolically (pre-deployment)?

Retrospectively, the field recognized a fifth question — whether measurement should be a consumer of the data plane (observer outside) or a participant in the data plane (observer inside the packets). The answer evolved across six generations, which we trace now.


13.2 Act 1: “It’s 1992. tcpdump Needs to Keep Up with Ethernet.”

It’s 1992. Ethernet runs at 10 Mbps. Network engineers at LBL want to debug TCP implementations, trace routing bugs, and watch protocols in flight. Steve McCanne and Van Jacobson look at the existing packet-capture tools — CSPF on SunOS, NIT on Sun, Ultrix’s packet filter — and measure their overhead. The tools use stack-based virtual machines to evaluate filter predicates on each packet, and they are 3-20x slower than necessary. At 15,000 packets per second on a busy Ethernet, the user/kernel copy cost is crushing.

“A packet filter is a kernel-resident facility for deciding which incoming packets should be delivered to a monitoring application.” — McCanne & Jacobson, 1993 (McCanne and Jacobson 1993)

What the pioneers saw: CPU was the bottleneck. Memory was small. Ethernet was shared medium, and promiscuous mode exposed every packet on the wire, but every packet that crossed the user/kernel boundary cost a copy. The winning move was to push a compiled filter into the kernel, evaluate it per packet, and only wake the user process when a match fired.

What remained invisible: 100 Gbps line rates, programmable switches, and distributed vantage points lay beyond their horizon. The BPF1 model assumed a single host watching its own wire. It assumed CPU ran faster than Ethernet. Both assumptions would break.

McCanne and Jacobson applied disaggregation by separating the filter expression (user-space, high-level language) from the filter execution (kernel, compiled bytecode). They applied decision placement by pushing per-packet decisions as close to the NIC as the safety model allowed. They built a register-based virtual machine with a control-flow graph, so TCP-port-80 checks rejected non-IP packets at byte 12 without evaluating later fields. The kernel compiled user filters to bytecode and executed them per packet.

13.2.1 Invariant Analysis: BPF (1993)

Invariant BPF’s Answer (1993) Gap?
State Stateless per-packet predicate No cross-packet memory
Time Synchronous, per-packet Bounded by packet arrival rate
Coordination Kernel serializes, single host No multi-vantage coordination
Interface BPF bytecode contract Fixed filter expressivity

The gaps were real. BPF lacked the expressivity for “flows with more than 100 packets” because that requires state across packets. It was limited to single-host observation — unable to correlate across vantage points. It answered only “does this packet match this predicate?”, leaving aggregate queries like “which source IPs are heavy hitters?” to external tools.

13.2.2 Environment → Measurement → Belief: Passive Capture

Layer What BPF Has What’s Missing
Environment Packets on the wire at this host Packets at other hosts
Measurement Filtered, compiled-predicate matches Aggregates, state, correlations
Belief “These packets matched this filter” “This is what the network did”

The E→M gap was physically limited: the vantage point saw only its wire. Fixing the gap required different sensors (distributed capture, path-level probes), not better estimators.

13.2.3 “The Gaps Didn’t Matter… Yet.”

In 1993, networks were small, uniform, and lab-friendly. A single tcpdump on a congested segment could diagnose most problems because most traffic crossed that segment. The gap between “what this host sees” and “what the network did” was small. That gap would widen rapidly.


13.3 Act 2: “It’s 1996. The Internet Is Commercial. Nobody Knows What It’s Doing.”

It’s 1996. The Internet has exploded from research-lab curiosity to commercial backbone. AS counts jump from hundreds to thousands. Path heterogeneity is extreme: one flow traverses 15 ASes across three continents. TCP performance is wildly variable, and nobody knows why. Operators see loss, reordering, latency spikes — but the signals conflate “bug in the TCP implementation” with “pathological path” and “routing flap”.

Vern Paxson deploys the Network Probe Daemon (NPD) at 35 sites and runs 20,000 TCP transfers between them. What he finds is not a protocol bug — it is a methodology problem.

“The nature of the medium means the phenomenon under study is not stationary, the measurement instruments interact with the phenomenon, and what we see is always a sample rather than the truth.” — Vern Paxson, 1997 (Paxson 1999)

What the pioneers saw: The ground truth was unobservable. Every tool — tcpdump, ping, traceroute — had its own biases. Clocks at different vantage points drifted relative to each other. BPF itself dropped packets under load (the measurer had measurement bias). Periodic sampling aliased with periodic network phenomena. “Measure harder” was insufficient — reasoning about the measurement pipeline itself was required.

What remained invisible: At scale, even good methodology has vantage-point blind spots. No collection of 35 sites can see traffic engineering decisions inside an AS. Paxson’s methodology assumes probes see representative paths; in the policy-routing era, probes see what carriers choose to show.

Paxson applied closed-loop reasoning by treating the measurement instrument itself as part of the system under observation. He applied disaggregation by separating the raw signal (timestamps, sequence numbers) from the inferred property (loss rate, RTT distribution), acknowledging each layer’s error budget. His sampling design — Poisson-modulated probing — avoided aliasing with periodic network phenomena, because uniform sampling produces systematically biased estimates of non-stationary signals.

13.3.1 Invariant Analysis: Paxson Methodology (1997)

Invariant Paxson’s Answer (1997) Gap?
State Probabilistic — signals are distributions Ground truth remains unknown
Time Relative clocks with skew correction Absolute cross-site sync hard
Coordination Distributed vantage points, uncoordinated No operator-internal visibility
Interface TCP-visible signals (SEQ/ACK, RTT) Limited to what protocol exposes

The gaps were acknowledged, not erased. Paxson’s contribution was honesty about what the signal reveals and where it stays silent.

13.3.2 Environment → Measurement → Belief: Active Probing

Layer What NPD Has What’s Missing
Environment Internet paths between 35 hosts Paths not traversing NPD
Measurement Probe-triggered RTTs, loss, reorderings Carrier-internal state
Belief Distributions of end-to-end properties Causation (“why did this path fail?”)

The E→M gap here is accidentally noisy — signals are honest but imperfect. Fix with better estimators (robust statistics, outlier handling), which is exactly what Paxson did.

13.3.3 “The Gaps Didn’t Matter… Yet.”

In 1996, a few thousand probes per day at 35 sites was enough because link speeds were megabits and per-flow rates were modest. Sampling at human-observable timescales (minutes) produced usable belief. As links went to gigabit and above, the methodology held but the cost of comprehensive probing became unworkable.


13.4 Act 3: “It’s 2005. SNMP Polling Can’t See 100ms Events.”

It’s 2005. Backbone links are 10 Gbps and rising. Operators run SNMP pollers that query each router every 5 minutes for counters (bytes transmitted, packet loss, queue depth) (Harrington et al. 2002). This was adequate when traffic changed on human timescales. It is now catastrophically slow. DDoS attacks rise and fall in seconds. Microbursts last 100 ms. Flash crowds appear and dissolve between polls. The management system sees outages ten minutes after users do.

NETCONF/YANG (2006-2011) extended the model with declarative configuration (Enns et al. 2011), but the measurement model stayed pull-based: the management station asks, the device responds. The Coordination invariant is broken — devices are passive responders, not active observers.

What the operators saw: Polling was safe, simple, and scalable-in-count (one poller handles hundreds of devices). It was unsafe in time: 5-minute intervals miss everything interesting. Reducing the polling interval to 1 second crushed the management network with SNMP traffic.

What remained invisible: Operators assumed the bottleneck was query-response latency. The real bottleneck was the request model itself — devices sitting idle between queries, knowing about events nobody asked about.

Streaming telemetry (gNMI (gRPC Network Management Interface), gRPC (gRPC Remote Procedure Call) dial-in/dial-out, OpenConfig YANG (Yet Another Next Generation)) (OpenConfig Working Group 2016) applied disaggregation by separating configuration (NETCONF push) from state reporting (streaming push). It applied decision placement by moving “what to report and when” from the management station (central) to the device (distributed). Devices decide what is worth reporting, based on local thresholds and schedules, and push proactively to collectors.

13.4.1 Invariant Analysis: SNMP → Streaming Telemetry

Invariant SNMP (1988) Streaming Telemetry (2016)
State Device holds queried variables Collector holds streams
Time Wallclock polls every N minutes Sequence numbers, lossless buffering
Coordination Pull, management asks Push, device decides to emit
Interface MIB + request/response YANG model + pub/sub

The shift inverted every invariant simultaneously. State moved from device to collector. Time moved from wallclock polls to sequence-numbered streams. Coordination inverted from pull to push. Interface shifted from MIB query to YANG publication. Figure 13.1 depicts the push-based architecture.

Figure 13.1: The figure contrasts three telemetry architectural paradigms spanning a 250-year range in measurement latency. SNMP polling (leftmost panel) exemplifies the request-response model: management stations periodically query devices (typically every 5 minutes) for counters like packet counts. The 5-minute polling window means operators observe network state with a delay of up to 5 minutes—an anomaly (packet loss spike, queue buildup) is invisible until the next polling cycle. But the architecture is simple: devices are stateless (only store counters), collectors are simple (send and receive), and operators control exactly what gets reported. Streaming telemetry (middle panel) inverts this: devices proactively push measurements to collectors at high frequency (every 1–5 seconds), eliminating polling delays. The moment a queue depth threshold is exceeded, collectors are notified. This enables reactive management: detect anomalies on the seconds timescale rather than waiting for the next 5-minute polling cycle. The cost is complexity: devices must make decisions about what to report, collectors must handle out-of-order delivery and deduplication, and per-packet bandwidth increases if not carefully filtered. In-band telemetry (rightmost panel) represents the frontier: switches embed measurement metadata directly into packet headers as they forward traffic. This achieves per-packet granularity at line rate, with no separate telemetry channel—a radical simplification for high-speed networks. INT (In-Band Network Telemetry) headers carry queue depth, switch utilization, and latency measurements from each hop. The switch can add ~50–100 bytes per packet without reducing forwarding throughput. But hardware support is required—not all switches have telemetry capabilities. This architectural progression reveals a fundamental tradeoff: latency-of-visibility (how fast operators see what is happening) versus simplicity and observability constraints (SNMP is simple but slow, streaming is fast but complex, in-band is fastest but hardware-dependent).

13.4.2 Environment → Measurement → Belief: Pull vs. Push

Layer SNMP Poll Model Streaming Telemetry
Environment Instantaneous counters on devices Instantaneous counters on devices
Measurement Sampled every 5 minutes Streamed at ms-sec cadence
Belief Coarse, delayed, under-sampled Fine-grained, near-real-time

The E→M gap was accidentally noisy under SNMP (fix with finer sampling), and streaming telemetry closed it — not by changing what devices know, but by changing how often they tell. The operational price: collectors now handle out-of-order delivery, buffering, deduplication that polling avoided.

13.4.3 “The Gaps Didn’t Matter… Yet.”

Even streaming telemetry caps at counter granularity. Queue-depth updates every 100 ms are useful, but per-packet queue state remained invisible. At 100 Gbps, the next bottleneck was per-packet state: streaming a counter that is already an aggregate yields only aggregate visibility. The next generation pushed measurement into the data plane itself.


13.5 Act 4: “It’s 2015. Packets Should Measure the Network Themselves.”

It’s 2015. Programmable switches are emerging — Barefoot Tofino, XPliant, PISA architectures compiled from P4 (Programming Protocol-independent Packet Processors). For the first time, switch ASICs expose per-packet internal state (ingress timestamp, egress timestamp, queue depth, link utilization) as first-class fields readable by the match-action pipeline. Datacenter operators at Google, Microsoft, and Facebook need microsecond visibility to debug performance. Streaming telemetry from the control plane misses microbursts. Sampling misses rare events. Something has to change.

Changhoon Kim and colleagues at Barefoot propose In-Band Network Telemetry — embed measurement metadata into packets themselves (Kim et al. 2015).

“INT collects and reports network state, by the data plane, without requiring intervention or work by the control plane.” — INT specification, 2015 (Kim et al. 2015)

What the pioneers saw: The control plane was too slow to collect per-packet data because the control plane runs on CPUs and the data plane runs on ASICs at line rate. The gap was fundamental. But if switches could write their state directly into packets passing through them, measurement happened inline — zero additional packets, zero CPU involvement.

What remained invisible: INT’s header overhead (12+ bytes per hop) would become a problem at deeper topologies and smaller packet sizes. Middleboxes that strip unknown options would break the model at administrative boundaries. PTP-grade clock synchronization across switches would become the new bottleneck.

INT applied decision placement by moving measurement into the switch data path. It applied disaggregation by separating measurement-signal insertion (switches, per-hop) from measurement-signal interpretation (sinks and collectors, end-to-end). Switches write telemetry metadata at line rate; sinks strip the metadata and ship it to collectors.

13.5.1 Invariant Analysis: INT (2015)

Invariant INT’s Answer (2015) Gap?
State Per-hop instantaneous snapshots No history, no aggregation
Time Microsecond per-hop timestamps Clock sync across switches
Coordination Distributed, per-switch writes No inter-switch coordination needed
Interface INT header format Header overhead, MTU interactions

INT inverted the measurement architecture: previously, the data plane forwarded while the control plane measured. After INT, the data plane measured itself, inline, per packet.

13.5.2 Environment → Measurement → Belief: In-Band

Layer What INT Has What’s Missing
Environment Per-switch instantaneous state Historical state
Measurement Per-packet per-hop metadata Aggregates, cross-flow analysis
Belief “This packet saw queue=X at hop 3” “What was average queue across all flows?”

The E→M gap here is physically limited in a new sense: INT sees everything on the path, but only for packets carrying INT headers, only at the moment they pass. History and aggregation must happen elsewhere.

INT gave operators per-packet, per-hop visibility. But operators still asked aggregate questions — top-k talkers, heavy hitters, DDoS victims. Writing per-question P4 programs was expensive (days per query). Naive “send all INT data to server” produces ~1 billion tuples/sec at 100 Gbps — infeasible.


13.6 Act 5: “It’s 2018. Queries Should Compile to Switches.”

It’s 2018. Operators have INT-enabled switches but still hand-code each telemetry task in P4. Each query requires manual reasoning about switch resource constraints (stages, SRAM, PHV bits) and a custom decision about what to aggregate on-switch versus stream to servers. There are hundreds of useful queries. Hand-coding each is impossible.

Arpit Gupta and colleagues at Princeton build Sonata (Gupta et al. 2018):

“Operators need to express telemetry tasks as declarative queries and let the system figure out how to execute them efficiently across programmable switches and servers.” — Gupta et al., 2018 (Gupta et al. 2018)

What the pioneers saw: The switch and the server are two compute tiers with complementary properties. Switches are fast and constrained (line-rate, bounded state, fixed stages). Servers are flexible and slower (arbitrary computation, unbounded state, software speed). Every telemetry query benefits from being split — but the split is a non-trivial optimization that depends on available switch resources and expected traffic patterns.

What remained invisible: The ILP2 partitioning assumed static switch capabilities and static traffic distributions. Real networks drift. Query rebalancing at scale would become an operations problem.

Sonata applied disaggregation explicitly at compile time: a query partition splits operators between switch and server. It applied decision placement by making the split an ILP decision, not an operator choice. The operator writes a declarative dataflow query (filter, map, reduce, distinct); Sonata’s compiler formulates an integer linear program that minimizes tuples-to-server subject to switch resource constraints, solves it, and emits P4 for the switch and streaming code for the server. Figure 13.2 shows the partitioned pipeline.

Figure 13.2: Sonata’s core innovation is automatic disaggregation of measurement queries between hardware (switches) and software (servers). A declarative query specifies what to measure—for example, “detect all source IPs sending traffic exceeding 1% of total volume.” The query compiler analyzes the query and switch capabilities (how many match-action stages are available? how much state memory?), then automatically partitions the computation: the switch executes simple operations (filtering, hashing, counting) at line rate using PISA (Protocol Independent Switch Architecture) pipelines, while servers execute complex operations (exact deduplication, threshold detection, anomaly analysis). The data reduction is dramatic: instead of forwarding 1 million packets per second (1 million candidate flows) from switch to server, the switch pre-aggregates to 1,000 candidate flows (1,000× reduction). The server then completes analysis on this pre-filtered, pre-aggregated dataset, yielding ~10 heavy-hitter flows—a cumulative 10,000× reduction from raw traffic to results. This disaggregation solves the flexibility–scalability–accuracy triangle, an impossibility result for centralized systems: you cannot simultaneously achieve all three. Full packet capture achieves flexibility (answer any future query) and accuracy (nothing discarded) but zero scalability (buffer fills in seconds at 100 Gbps). Sampling achieves scalability (sample 1 per 1,000 packets) and flexibility (analyze samples post-hoc) but poor accuracy (rare events invisible). Sonata achieves both high flexibility and high scalability by accepting that the decision of what to compute happens at query planning time (via the ILP solver), not at runtime. Once the ILP solver determines optimal partitioning—which predicates execute on the switch versus server—the partition is fixed for that query’s lifetime (possibly seconds to hours). This trades flexibility-at-runtime for flexibility-at-planning-time, enabling both scalability (switch pre-aggregates, reducing server load) and accuracy (no sampling; exact counts maintained).

Example query — DNS amplification victims:

victims = (packets
    .filter(lambda p: p.udp_sport == 53)
    .map(lambda p: (p.dst_ip, p.src_ip))
    .distinct()
    .reduce(keys=["dst_ip"], op=sum)
    .filter(lambda k, v: v > threshold))

Naive: send all port-53 packets to server — millions/sec. Sonata: execute filter + distinct + count on switch, stream only top-k victims — tens of tuples/sec. Four orders of magnitude reduction in tuples-to-server.

13.6.1 Invariant Analysis: Sonata (2018)

Invariant Sonata’s Answer (2018) Gap?
State Partitioned switch/server Bounded by switch resources
Time Windowed, compile-time schedule Windows set at plan time
Coordination Compile-time ILP, runtime streaming ILP solve in seconds-minutes
Interface Declarative dataflow language Limited expressivity (no loops)

13.6.2 The Impossibility Triangle, Revisited

Architecture Flexibility Scalability Accuracy
Full packet capture High Low High
NetFlow (flow-level export) sampling (Claise et al. 2013) Medium High Low
Switch-only queries Low High High
Sonata (switch + server) High High High

Sonata stays within the triangle. It finds a better Pareto point by disaggregating across compute tiers. The cost lives in compile-time ILP solving (seconds to minutes) and query-language restrictions (no unbounded loops). This is the same pattern as SDN (control/data split), RDMA (OS/NIC split), and CDNs (origin/edge split): when one component is resource-constrained, push work to a less-constrained tier and pay the cost in a compiler/planner.

Related work in the same era — Marple (Narayana et al. 2017) with key-value abstractions, Everflow (Zhu et al. 2015) with match-and-mirror, ndb (Handigol et al. 2014) with packet postcards — established programmable telemetry as a category distinct from INT (mechanism) and sketches (approximate counting).


13.7 Act 6: “It’s 2018. Measurement Is Too Late. Verify Before You Deploy.”

It’s 2018. Operators observe failures via streaming telemetry and INT and Sonata. But failures have already happened when they’re observed. The packets are already dropped. A config push at 2 AM causes a blackhole at 2:01; the operator sees it at 2:03 via streaming telemetry; the traffic is down for three minutes. Observation is reactive.

The question becomes: can we verify that a proposed configuration will behave correctly, before deploying it? This is the management-system companion to measurement — if measurement is “observe what happened”, verification is “prove what will happen”.

Three lineages:

Header Space Analysis (Kazemian, 2012) (Kazemian et al. 2012) treats packets as points in \(\{0,1\}^L\) space and network functions as transfer functions, then symbolically computes reachability. A static checker answers: “can traffic from A reach B? Are there loops? Are there black holes?”

VeriFlow (Khurshid, 2013) (Khurshid et al. 2013) verifies invariants in real time — as each rule update arrives at the controller, VeriFlow checks (sub-millisecond) whether reachability, loop-freedom, or waypoint properties are violated. Verification becomes part of the control-plane pipeline.

Batfish (Fogel, 2015) (Fogel et al. 2015) parses multi-vendor configuration files (Cisco IOS, Juniper JunOS, Arista EOS), simulates the control plane (BGP, OSPF, static routes), computes the resulting RIBs/FIBs, and answers reachability queries. Verification before deployment.

“Most network outages are caused by configuration errors, not hardware failures or software bugs. We need to analyze configurations before they are deployed.” — Fogel et al., 2015 (Fogel et al. 2015)

What the pioneers saw: Runtime observation is fundamentally reactive. The cheapest packet drop is the one that never happens because the config that would have caused it was rejected at review time.

What remained invisible: Configurations are not the whole story. Runtime state drifts (BGP flaps, HSRP failovers, ACL hit-counters). Pure config analysis misses these. Follow-on work (Minesweeper, Plankton, SyNET) closed some of these gaps.

These systems applied disaggregation by separating intent (what the operator wants) from configuration (what the devices run), enabling tools to reason about their gap. They applied closed-loop reasoning in the counterfactual: verification answers “could this configuration violate intent?” — which is the symbolic closed-loop between intent and config.

13.7.1 Invariant Analysis: Verification Tools

Invariant Batfish/HSA/VeriFlow
State Symbolic — all packets, all failures simultaneously
Time Static (Batfish/HSA) or rule-update incremental (VeriFlow)
Coordination Centralized analyzer over all configs
Interface Query language (“can A reach B?”)

Verification is measurement in the counterfactual. Operational telemetry answers “what is happening?” Verification answers “what could happen?”


13.8 Act 7: “It’s 2018+. eBPF Brings Programmability Back to the Host.”

It’s 2018. The data-plane programmability story moved measurement into switches. Meanwhile, a parallel story unfolds at the host: eBPF (extended Berkeley Packet Filter) and XDP (eXpress Data Path) extend the original BPF model (Act 1) into a full kernel programming environment. Programs can run at the earliest hop of the network stack (XDP, at the driver), with safe in-kernel execution verified by the kernel’s eBPF verifier.

eBPF inherits BPF’s foundational idea — compiled safe programs in the kernel — and extends it with maps (persistent state across packets), helper functions, and cross-subsystem instrumentation. Host-level programmable telemetry now matches in-network programmable telemetry.

What changed: The BPF model assumed stateless per-packet predicates (Act 1). eBPF adds cross-packet state via maps, enabling the same query patterns Sonata compiles to switches, but executing on the host. This closes the loop: the same query language that targets INT-capable switches can target eBPF-capable hosts, blending in-network and host telemetry.


13.9 The Grand Arc: From tcpdump to Verified Intent

13.9.1 The Evolving Anchor

Era Binding Constraint Measurement Locus Belief Quality
1993 User/kernel copy cost Host kernel (BPF) Per-packet, stateless
1997 Ground truth unobservable Distributed probes Probabilistic distributions
2005 Polling too slow Device → collector (pull) Coarse, delayed
2016 Polling inverts Device → collector (push) Fine-grained streams
2015 Control plane too slow In-band, per packet Per-hop, per-packet
2018 Query flexibility at line rate Switch + server (Sonata) Partitioned aggregates
2015+ Observation is reactive Symbolic analyzer Counterfactual proofs
2018+ Host visibility gap Kernel eBPF/XDP Host-side programmable

13.9.2 Three Design Principles Applied Across the Arc

Disaggregation. Every generation separated concerns differently. BPF separated filter expression from filter execution. Streaming telemetry separated configuration push from state streaming. INT separated per-hop metadata insertion from end-to-end interpretation. Sonata separated line-rate aggregation (switch) from complex analysis (server). Verification separated intent from configuration. The same tool, applied to different constraints.

Closed-loop reasoning. Paxson modeled the measurement instrument itself as part of the system under observation. Streaming telemetry tightened the loop from 5-minute polls to millisecond streams. Verification applies closed-loop reasoning in the counterfactual: prove the loop between intent and deployed behavior closes, before the packets flow.

Decision placement. The arc is a story of pushing measurement decisions closer to the data. BPF pushed filter decisions into the kernel. INT pushed measurement into the switch ASIC per packet. Sonata made the switch-vs-server placement itself an ILP decision. eBPF pushed measurement back to the host kernel with full programmability.

13.9.3 The Dependency Chain

flowchart TD
  A[10 Mbps Ethernet, CPU bottleneck] --> B[BPF 1993: compiled kernel filters]
  B --> C[Commercial Internet growth]
  C --> D[Paxson 1997: probabilistic methodology]
  D --> E[10 Gbps links, microburst events]
  E --> F[SNMP polling 5-min intervals miss events]
  F --> G[Streaming telemetry 2016: push model]
  G --> H[100 Gbps, per-packet visibility needed]
  H --> I[INT 2015: data plane measures itself]
  I --> J[Per-query P4 hand-coding infeasible]
  J --> K[Sonata 2018: declarative + ILP partition]
  H --> L[Runtime observation is reactive]
  L --> M[Batfish/HSA/VeriFlow: verify before deploy]
  B --> N[eBPF/XDP 2018+: host-side programmable]

  style A fill:#e3f2fd
  style C fill:#e3f2fd
  style E fill:#e3f2fd
  style H fill:#e3f2fd
  style F fill:#ffebee
  style J fill:#ffebee
  style L fill:#ffebee
  style B fill:#e8f5e9
  style D fill:#e8f5e9
  style G fill:#e8f5e9
  style I fill:#e8f5e9
  style K fill:#e8f5e9
  style M fill:#e8f5e9
  style N fill:#e8f5e9

13.9.4 Pioneer Diagnosis Table

Year Pioneer Invariant Diagnosed Contribution
1993 McCanne & Jacobson Interface Compiled kernel packet filters (BPF)
1997 Paxson State Internet measurement methodology
2012 Kazemian State (symbolic) Header Space Analysis
2013 Khurshid Time (real-time) VeriFlow rule-update verification
2015 Kim Interface (in-band) INT per-hop telemetry
2015 Fogel State (counterfactual) Batfish config analysis
2016 OpenConfig WG Coordination (push) gNMI streaming telemetry
2017 Narayana Interface (language) Marple per-packet queries
2018 Gupta Coordination (compile-time) Sonata query-driven telemetry

13.9.5 Innovation Timeline

flowchart TD
    subgraph sg1["Foundations"]
        A1["1988 — SNMPv1 polling"]
        A2["1993 — BPF compiled filters"]
        A3["1996 — NetFlow v5"]
        A4["1997 — Paxson: methodology"]
        A1 --> A2 --> A3 --> A4
    end
    subgraph sg2["Streaming Era"]
        B1["2001 — sFlow: sampled flow sampling"]
        B2["2002 — SNMPv3"]
        B3["2011 — NETCONF/YANG"]
        B4["2013 — IPFIX standardization"]
        B1 --> B2 --> B3 --> B4
    end
    subgraph sg3["Verification"]
        C1["2012 — Header Space Analysis"]
        C2["2013 — VeriFlow: real-time"]
        C3["2015 — Batfish: config analysis"]
        C1 --> C2 --> C3
    end
    subgraph sg4["Programmable"]
        D1["2015 — INT: in-band telemetry"]
        D2["2016 — gNMI/OpenConfig streaming"]
        D3["2017 — Marple"]
        D4["2018 — Sonata: query-driven"]
        D5["2018 — eBPF/XDP matures"]
        D1 --> D2 --> D3 --> D4 --> D5
    end
    sg1 --> sg2 --> sg3 --> sg4

Measurement & Management Innovations


13.10 Generative Exercises

TipExercise 1: Measurement Under Privacy Constraints

A regulator requires that ISP measurement platforms must never expose per-user browsing patterns, but operators still need to diagnose bufferbloat, identify top-talking flows, and detect DDoS. Design a Sonata-style query partition that guarantees per-user anonymity while preserving operational visibility.

  • Which State invariant answer changes? What can the switch aggregate on, and what can it not?
  • How does the Interface invariant change at the server boundary?
  • Where on the impossibility triangle (flexibility/scalability/accuracy) do you spend the privacy cost?
TipExercise 2: Verification Under Runtime Drift

Batfish verifies configurations pre-deployment. But BGP flaps, HSRP failovers, and ACL hit-counters cause runtime state to drift from configured intent within hours. Propose a hybrid architecture that combines Batfish-style configuration verification with streaming telemetry to close the intent→runtime loop continuously.

  • Which invariant does pure config verification miss?
  • What is the loop period of the closed loop between intent verification and runtime drift detection?
  • Where does this loop live on the distributed-centralized axis?
TipExercise 3: Measurement at Administrative Boundaries

INT embeds telemetry in packet headers. When a flow crosses from an INT-capable domain to a middlebox-heavy domain that strips unknown options, telemetry is silently lost. Design a fallback mechanism that detects the boundary, degrades gracefully, and still provides useful end-to-end visibility.

  • How do you detect that INT headers are being stripped?
  • Which measurement-quality category (accidentally noisy / physically limited / structurally filtered) describes signals that cross this boundary?
  • What new Interface contract would you propose between administrative domains to preserve minimal telemetry?

13.11 Summary

Measurement systems answer the same four invariants as operational systems, bound by the Heisenberg constraint: active probes perturb what they measure, passive observation is always incomplete, and the impossibility triangle (flexibility/scalability/accuracy) forbids having all three. Every generation found a different Pareto point — BPF by compiling filters into the kernel, Paxson by treating noise as first-class, streaming telemetry by inverting pull to push, INT by embedding measurement in packets, Sonata by partitioning queries across switch and server via compile-time ILP, and Batfish/HSA/VeriFlow by moving observation into the counterfactual via symbolic verification.

The arc ends at a research frontier where measurement, management, and verification converge. Intent-based networking proposes a single closed loop: operators state intent, verification proves configs meet intent, streaming telemetry observes runtime, and automation reconciles drift. The same query language targets both eBPF hosts and INT switches. The same framework answers “what is happening?”, “what could happen?”, and “what should happen?” The Heisenberg constraint is not escaped — it is engineered around, one disaggregation at a time.


  1. Berkeley Packet Filter (BPF) is a register-based virtual machine that runs in the kernel. User-space programs compile filter expressions (e.g., “tcp port 80”) into BPF bytecode, which the kernel executes on every packet without copying the full packet to user space. This avoids the kernel-user copy bottleneck that limited earlier packet capture.↩︎

  2. Integer Linear Programming (ILP) is an optimization technique where the objective and constraints are linear and variables must be integers. Sonata uses ILP to partition query operators between switches (fast, limited state) and servers (slow, unlimited state), minimizing the traffic that must leave the switch.↩︎