UCSB CS Theory Colloquium Series

Spring 2022

Friday, April 15, 2pm, Phelps 2510

Sepideh Mahabadi (Microsoft)

Title: Diversity Maximization over Large Data Sets

Abstract: In this talk, we consider efficient construction of "composable core-sets" for the task of diversity maximization. A core-set is a subset of the data set that is sufficient for approximating the solution to the whole data set. A composable core-set is a core-set with the composability property: given a collection of data sets, the union of the core-sets for all data sets in the collection, should be a core-set for the union of the data sets. Using composable core-sets one can obtain efficient solutions to a wide variety of massive data processing applications, such as the streaming and the map-reduce models of computation.
The notion of diversity can be captured using several measures such as "minimum pairwise distance", "sum of pairwise distances", and the "volume" of the selected points, which has recently gained a lot of interest for modeling diversity. In this talk, we will review these notions and present core-set construction algorithms that are simple to implement and achieve almost optimal approximation guarantee.