(preprint)
Ömer Egecioglu and Hakan Ferhatosmanoglu
Dynamic Dimensionality Reduction and Similarity Distance Computation by Inner
Product Approximations
Abstract.
Developing efficient ways for dimensionality reduction is crucial for the query
performance in multimedia
databases. For approximation queries a careful analysis must be performed on the
approximation quality of the
dimensionality reduction technique. Besides having the lower-bound property, we
expect the techniques to have
good quality of distance measures when the similarity distance between two feature
vectors is approximated by some
notion of distance between two lower dimensional transformed vectors. Thus it is
desirable to develop techniques
which have accurate approximations to the original similarity distance when we
eschew the lower-bound property. In
this paper, we develop dynamic dimensionality reduction based on the approximation
of the standard inner-product.
The method uses the power symmetric functions of the components of the vectors,
which are powers of the p-norms
of the vectors for p = 1,2,.., m. The number m of such norms used is a parameter
of the algorithm whose simplest
instance gives a first-order approximation implied by the Cauchy-Schwarz
inequality. We show how to compute
fixed coefficients that work as universal weights based on the moments of the
probability density function assumed
for the distribution of the components of the input vectors in the data set. If
the distribution of the components of the
vectors is not known we show how the method can be adapted to work dynamically by
incremental adjustment of the
parameters. The experiments conducted on various distributions show that with this
technique the similarity between
two objects in high dimensional space can be accurately approximated by a
significantly lower dimensional
representation.
Keywords: Distance approximation, dimensionality reduction, similarity search,
inner-product, least-squares,
symmetric function, p-norm.
omer@cs.ucsb.edu