ABSTRACT

ABSTRACT Representing objects such as images by their feature vectors and searching for similarity according to the distances of the points representing them in high-dimensional space via k-nearest neighbors (k-NNs) to a target image is a popular paradigm. We discuss a combination of singular value decomposition (SVD), clustering, and indexing to reduce the cost of processing k-NN queries for large data sets with highdimensional data. We rst review dimensionality reduction methods with emphasis on SVD and related methods, followed by a survey of clustering and indexing methods for high-dimensional numerical data. We describe combining SVD and clustering as a framework and the main memory-resident ordered partition (OP)-tree index to speed up k-NN queries. We discuss techniques to save the OP-tree on disk and specify the stepwise dimensionality increasing (SDI) index suited for k-NN queries on dimensionally reduced data.