Proc. of the 5-th European Conference on Principles of
Data Mining and Knowledge Discovery (PKDD 2001),
LNAI 2168,
Freiburg, Germany, September 2001, pp. 79-90.
Ömer Egecioglu
Parametric Approximation Algorithms for High-Dimensional Euclidean
Similarity
Abstract.
We introduce a spectrum of algorithms for measuring the
similarity of high-dimensional vectors in Euclidean space. The
algorithms proposed consist of a convex combination of
two measures: one which contains
summary data about the shape of a vector, and the other
about the relative magnitudes of the coordinates.
The former is based on a concept called bin-score
permutations and a metric to quantify similarity of permutations,
the latter on another novel approximation for
inner-product computations based on power symmetric functions,
which generalizes the Cauchy-Schwarz inequality.
We present experiments on time-series data on labor statistics unemployment
figures that show
the effectiveness of the algorithm as a function of the parameter that
combines the two parts.
omer@cs.ucsb.edu