Technical Report TRCS98-37, Computer Science Department, University of California, Santa Barbara, December 1998.

Ömer Egecioglu

How to Approximate the Inner-product: Fast Dynamic Algorithms for Euclidean Similarity

Abstract. We develop dynamic dimensionality reduction based on the approximation of the standard inner-product. This results in a family of fast algorithms for checking similarity of objects whose feature representations are large dimensional real vectors, a common situtiton in various multimedia databases. The method uses the power symmetric functions of the components of the vectors, which are powers of the p-norms of the vectors for p = 1, 2,.., m. The number m of such norms used is a parameter of the algorithm whose simplest instance gives a first-order approximation implied by the Cauchy-Schwarz inequality. We show how to compute fixed coefficients that work as universal weights based on the moments of the probability density function assumed for the distribution of the components of the input vectors in the data set. If the distribution of the components of the vectors is not known we show how the method can be adapted to work dynamically by incremental adjustment of the parameters.

Keywords: Distance approximation, inner-product, similarity search, dimensionality reduction, least-squares.

omer@cs.ucsb.edu