Faster Dimension Reduction

被引:44
作者
Ailon, Nir [1 ]
Chazelle, Bernard [1 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
APPROXIMATE NEAREST-NEIGHBOR; FAST RANDOMIZED ALGORITHM; JOHNSON-LINDENSTRAUSS; LINEAR COMPLEXITY; QUERIES; TRANSFORM; MATRICES; GRAPHS; SPACES;
D O I
10.1145/1646353.1646379
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data represented geometrically in high-dimensional vector spaces can be found in many applications. Images and videos, are often represented by assigning a dimension for every pixel (and time). Text documents may be represented in a vector space where each word in the dictionary incurs a dimension. The need to manipulate such data in huge corpora such as the web and to support various query types gives rise to the question of how to represent the data in a lower-dimensional space to allow more space and time efficient computation. Linear mappings are an attractive approach to this problem because the mapped input can be readily fed into popular algorithms that operate on linear spaces (such as principal-component analysis, PCA) while avoiding the curse of dimensionality. The fact that such mappings even exist became known in computer science following seminal work by Johnson and Lindenstrauss in the early 1980s. The underlying technique is often called "random projection." The complexity of the mapping itself, essentially the product of a vector with a dense matrix, did not attract much attention until recently. In 2006, we discovered a way to "sparsify" the matrix via a computational version of Heisenberg's Uncertainty Principle. This led to a significant speedup, which also retained the practical simplicity of the standard Johnson-Lindenstrauss projection. We describe the improvement in this article, together with some of its applications.
引用
收藏
页码:97 / 104
页数:8
相关论文
共 40 条
[21]   THE JOHNSON-LINDENSTRAUSS LEMMA AND THE SPHERICITY OF SOME GRAPHS [J].
FRANKL, P ;
MAEHARA, H .
JOURNAL OF COMBINATORIAL THEORY SERIES B, 1988, 44 (03) :355-362
[22]  
Indyk P., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P604, DOI 10.1145/276698.276876
[23]  
Indyk P, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P371
[24]   On approximate nearest neighbors in non-Euclidean spaces [J].
Indyk, P .
39TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1998, :148-155
[25]  
Johnson William B, 1984, C MODERN ANAL PROBAB
[26]  
Kleinberg J. M., 1997, P 20 9 ANN ACM S THE, P599
[27]   Efficient search for approximate nearest neighbor in high dimensional spaces [J].
Kushilevitz, E ;
Ostrovsky, R ;
Rabani, Y .
SIAM JOURNAL ON COMPUTING, 2000, 30 (02) :457-474
[28]   Randomized algorithms for the low-rank approximation of matrices [J].
Liberty, Edo ;
Woolfe, Franco ;
Martinsson, Per-Gunnar ;
Rolchlin, Vladimir ;
Tyger, Mark .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (51) :20167-20172
[29]   GEOMETRY OF GRAPHS AND SOME OF ITS ALGORITHMIC APPLICATIONS [J].
LINIAL, N ;
LONDON, E ;
RABINOVICH, Y .
COMBINATORICA, 1995, 15 (02) :215-245
[30]   Guillotine subdivisions approximate polygonal subdivisions:: A simple polynomial-time approximation scheme for geometric TSP, k-MST, and related problems [J].
Mitchell, JSB .
SIAM JOURNAL ON COMPUTING, 1999, 28 (04) :1298-1309