Faster Dimension Reduction

被引:44
作者
Ailon, Nir [1 ]
Chazelle, Bernard [1 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
APPROXIMATE NEAREST-NEIGHBOR; FAST RANDOMIZED ALGORITHM; JOHNSON-LINDENSTRAUSS; LINEAR COMPLEXITY; QUERIES; TRANSFORM; MATRICES; GRAPHS; SPACES;
D O I
10.1145/1646353.1646379
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data represented geometrically in high-dimensional vector spaces can be found in many applications. Images and videos, are often represented by assigning a dimension for every pixel (and time). Text documents may be represented in a vector space where each word in the dictionary incurs a dimension. The need to manipulate such data in huge corpora such as the web and to support various query types gives rise to the question of how to represent the data in a lower-dimensional space to allow more space and time efficient computation. Linear mappings are an attractive approach to this problem because the mapped input can be readily fed into popular algorithms that operate on linear spaces (such as principal-component analysis, PCA) while avoiding the curse of dimensionality. The fact that such mappings even exist became known in computer science following seminal work by Johnson and Lindenstrauss in the early 1980s. The underlying technique is often called "random projection." The complexity of the mapping itself, essentially the product of a vector with a dense matrix, did not attract much attention until recently. In 2006, we discovered a way to "sparsify" the matrix via a computational version of Heisenberg's Uncertainty Principle. This led to a significant speedup, which also retained the practical simplicity of the standard Johnson-Lindenstrauss projection. We describe the improvement in this article, together with some of its applications.
引用
收藏
页码:97 / 104
页数:8
相关论文
共 40 条
[1]   Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].
Achlioptas, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687
[2]  
AILON N, 2008, DISCRETE COMPUTATION
[3]  
AILON N, 2008, APPROXRANDOM, P512
[4]   THE FAST JOHNSON-LINDENSTRAUSS TRANSFORM AND APPROXIMATE NEAREST NEIGHBORS [J].
Ailon, Nir ;
Chazelle, Bernard .
SIAM JOURNAL ON COMPUTING, 2009, 39 (01) :302-322
[5]   Problems and results in extremal combinatorics - I [J].
Alon, N .
DISCRETE MATHEMATICS, 2003, 273 (1-3) :31-53
[6]  
Alon N., 2015, PROBABILISTIC METHOD
[7]  
[Anonymous], 2004, HDB DISCRETE COMPUTA
[8]   Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems [J].
Arora, S .
JOURNAL OF THE ACM, 1998, 45 (05) :753-782
[9]  
ARYA S, 1993, PROCEEDINGS OF THE FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P271
[10]   An optimal algorithm for approximate nearest neighbor searching in fixed dimensions [J].
Arya, S ;
Mount, DM ;
Netanyahu, NS ;
Silverman, R ;
Wu, AY .
JOURNAL OF THE ACM, 1998, 45 (06) :891-923