(preprint)

Ömer Egecioglu and Hakan Ferhatosmanoglu

Dynamic Dimensionality Reduction and Similarity Distance Computation by Inner Product Approximations

Abstract. Developing efficient ways for dimensionality reduction is crucial for the query performance in multimedia databases. For approximation queries a careful analysis must be performed on the approximation quality of the dimensionality reduction technique. Besides having the lower-bound property, we expect the techniques to have good quality of distance measures when the similarity distance between two feature vectors is approximated by some notion of distance between two lower dimensional transformed vectors. Thus it is desirable to develop techniques which have accurate approximations to the original similarity distance when we eschew the lower-bound property. In this paper, we develop dynamic dimensionality reduction based on the approximation of the standard inner-product. The method uses the power symmetric functions of the components of the vectors, which are powers of the p-norms of the vectors for p = 1,2,.., m. The number m of such norms used is a parameter of the algorithm whose simplest instance gives a first-order approximation implied by the Cauchy-Schwarz inequality. We show how to compute fixed coefficients that work as universal weights based on the moments of the probability density function assumed for the distribution of the components of the input vectors in the data set. If the distribution of the components of the vectors is not known we show how the method can be adapted to work dynamically by incremental adjustment of the parameters. The experiments conducted on various distributions show that with this technique the similarity between two objects in high dimensional space can be accurately approximated by a significantly lower dimensional representation.

Keywords: Distance approximation, dimensionality reduction, similarity search, inner-product, least-squares, symmetric function, p-norm.

omer@cs.ucsb.edu