Research

Research Philosophy

In our modern world, the ability to access information and communication technology is not just a convenience but is increasingly seen as a crucial human right. This ability is vital for people around the world to derive socio-economical benefits from the Internet by engaging in activities such as education, healthcare, commerce, and civic participation. However, despite years of effort from various stakeholders, a significant digital divide persists that sharply separates those with seamless access to the Internet and cutting-edge communication technologies from those who remain underserved. The wide-ranging social and economic consequences of this divide cannot be overstated.

I am committed to pursuing a research agenda aimed at forging a path toward a more equitable digital future.

To this end, my research focuses on two key themes. First, I explore how to advance Artificial Intelligence (AI) and Machine Learning (ML) for cybersecurity, with a special emphasis on democratizing the development of production-ready AI/ML artifacts. This initiative primarily aims to lower the threshold for collecting the right data for training ML models. Such ML models would be especially beneficial in network environments with limited budgets, operational capacity, and technical expertise. These environments are poised to benefit greatly from AI and ML advancements but are challenged in tapping into these technologies due to the high thresholds for developing trustworthy and generalizable ML artifacts.

The second area concerns Internet measurement research, with an emphasis on enabling data-driven policymaking. This approach centers around providing policymakers with access to the right data, which aids in evaluating existing policies and informing the syntheses of new policies. This effort is crucial for optimizing the use of limited capital resources to benefit underprivileged communities, thereby addressing their specific needs more effectively.

Production-ready ML for Networks –> Self-driving Networks

According to the report from the National Security Commission on AI, advancements in AI have empowered malicious actors, thereby increasing the vulnerability of our digital ecosystems to various cyber threats. To counter this rising threat, we need an AI-enabled cybersecurity stack equipped with numerous intelligent modules or bots. Their collective input should enable the extraction of subtle trends in data, identify diverse attack vectors and workflows, and assist in synthesizing appropriate defense policies to neutralize these threats. Moreover, it is crucial to democratize access to this AI-enabled cybersecurity stack, which we refer to as self-driving networks.

The goal is to develop an AI-enabled stack that keeps the network secure and performant while requiring minimal human interventions. Specifically, we explore how to leverage machine learning (ML) and software-defined networks (SDN) to lower the cost of deploying and operating highly available, reliable, performant, and secure last-mile and enterprise networks.

Developing self-driving networks requires solving various fundamental research problems, which includes answering how we can

Enable accurate and flexible (streaming) analytics over network data at scale (see Sonata, DynamiQ, Panakos, OpTel);
Lower the threshold to curate high-quality datasets for different learning problems from diverse network environments at scale (see PINOT, netUnicorn);
Develop production-ready ML artifacts that can both accurately assess the network’s state and take effective actions to keep networks performant and secure (see netUnicorn, Trustee); and
Establish trust in ML-based artifacts so network operators feel confident enough to relinquish control to these artifacts in production settings (see Trustee).
…

Data-driven Policymaking

The goal here is to develop tools and infrastructures that enable collecting the right data that can inform policy interventions targeting digital equity, including consumer subsidy programs, rate regulations, infrastructure funding, etc.

Addressing the data problem for policymakers entails solving various fundamental research problems, which includes answering

What broadband plans, which includes both speed and price, are available in a region? See our SIGCOMM paper for details.
How to make the best use of noisy crowdsourced network measurement data? See our IMC paper on this topic for more details.
How to quantify the efficacy of different policy interventions (e.g., consumer subsidies, rate regulations, etc.)?
…

Ongoing Projects

Some of the projects that provide a decent sample of ongoing research activities at SNL:

Trustee: A framework that cracks open decision-making for black-box ML models (for networks) using high-fidelity, low-complexity, and stable decision trees.
BQT: A tool that queries broadband plan offerings from major ISPs in the US at street-level granularity.
PINOT: A programmable data-collection infrastructure at UCSB to collect fine-grained (labeled) network data at scale.
netUnicorn: A data-collection platform that simplifies collecting network data for different learning problems from diverse network environments.
netFound: A foundation model for networking data that employs self-supervised learning techniques on abundant unlabeled network data, passively collected from production environment using PINOT for task-agnostic pre-training and smaller-scale labeled network data, actively collected using PINOT and netUnicorn for task-specific fine-tuning.

Funding

The research in my group is supported by various government agencies, namely, the National Science Foundation (NSF), the Department of Energy (DoE), as well as different network/content service providers such as Google, Verizon Innovations, ViaSat, and vendors including Intel and Cisco.

You can find more details about some of the funded projects here:

Low Infrastructure ML ($100k, Google, 2025-26)
Network Foundation Model for Enabling AI-powered Network Operations (AIOps) ($60k, Google, 2025-26)
Characterizing Broadband Pricing in California ($125k, California Public Utility Commission, 2025-27)
Characterizing Barriers to Digital Inclusion in Virginia ($30k, Virginia Joint Commission on Technology and Sciences, 2025)
Developing Generalizable ML Models for Diverse Learning Problems in Network Operations ($700k, NSF, 2025-30)
Telemetry-driven Foundation Models for Self-Driving Networks ($90k, Cisco Research, 2024-25)
netFound: Network Foundation Model (DoE, 2024-28)
IMR: MT: NetFlex: A Flexible Scalable & Privacy-Preserving Network Measurement Platform to Iteratively Collect Multi-modal Multi-view Network Data from Access Networks ($600k, NSF, 2023-25)
IMR: RI-P: Programmable Closed-loop Measurement Platform for Last-Mile Networks ($100k, NSF, 2022-24)
IMR: MM-1A: ADDRESS: Augment, Denoise and Debias Crowdsourced Measurements for Statistical Synthesis of Internet Access Characterization ($600k, NSF, 2022-25)
CC* Integration-Large: Democratizing Networking Research in the Era of AI/ML ($1M, NSF, 2021-24)
CC* Integration-Large: Bringing Code to Data: A Collaborative Approach to Democratizing Internet Data Science ($1M, NSF, 2021-24)
The Estimation and Monitoring of Quality of Experience Delivered over Internet Services ($200k, ViaSat, 2022-*)
MLWiNS: RL-based Self-driving Wireless Network Management System for QoE Optimization ($820k, NSF and Intel, 2020-2024)
Scaling Cybersecurity Infrastructure using Programmable Data Planes ($200k, Verizon, 2019-22)

Arpit Gupta