Proc. 6-th String Processing and Information REtrieval
Conference (SPIRE'99), IEEE Computer Society,
Sep. 1999, Cancun, Mexico, pp. 8-15.
Abdullah N. Arslan and Ömer Egecioglu
An Efficient Uniform-Cost Normalized Edit Distance Algorithm
Abstract.
A common model for computing the similarity of
two strings X and Y of lengths m, and n respectively with m >=n,
is to transform
X into Y through a sequence of
three types of edit operations:
insertion, deletion, and substitution.
The model assumes a given cost
function which assigns a non-negative real weight
to each edit operation.
The amortized weight for a given edit sequence is
the ratio of
its weight to its length, and the minimum of this ratio
over all
edit sequences
is the
normalized edit distance.
Existing algorithms for normalized edit distance computation
with proven complexity bounds
require O(mn^2) time in the
worst-case.
We give an
O(mn logn)-time algorithm for the problem
when the cost function is uniform,
i.e,
the weight of each edit operation is constant within
the same type, except substitutions can have different weights depending on whether
they are matching or non-matching.
omer@cs.ucsb.edu