ABSTRACT

Similarity search is known to be achievable in at least three ways. e rst technique is called exhaustive search. For a given query, the distance between the query and the database object is calculated while keeping track of which objects are similar (near) to the query. e main problem with this technique is that it does not scale with large collections. e second technique is exact search. Using space decomposition techniques, the number of objects that are compared to the query is reduced. is technique is not ecient for highdimensional data sets, due to the curse of dimensionality [2]. As the dimensions increase, a large part of the database needs to be scanned. Hence, the performance becomes similar to that of exhaustive search. e third technique is approximate search. It reduces the amount of data that needs to be accessed by using some space partitioning (generally done using pivots) [3-5] or space transformation techniques [6-10]. It provides fast and scalable response time while accepting some imprecision in the results. e most common techniques for that are the locality-sensitive hashing (LSH) [7], FastMap [6], and M-tree [8]. A survey can be found in Reference 11. Recently, a new technique based on permutations is proposed, which is called permutation-based indexing (PBI) [12].