Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

被引：2505

作者：

Halko, N. ^{[1
]}

Martinsson, P. G. ^{[1
]}

Tropp, J. A. ^{[2
]}

机构：

[1] Univ Colorado, Dept Appl Math, Boulder, CO 80309 USA

[2] CALTECH, Pasadena, CA 91125 USA

来源：

SIAM REVIEW | 2011年 / 53卷 / 02期

基金：

美国国家科学基金会;

关键词：

dimension reduction; eigenvalue decomposition; interpolative decomposition; Johnson-Lindenstrauss lemma; matrix approximation; parallel algorithm; pass-efficient algorithm; principal component analysis; randomized algorithm; random matrix; rank-revealing QR factorization; singular value decomposition; streaming algorithm;

D O I：

10.1137/090771806

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m x n matrix. (i) For a dense input matrix, randomized algorithms require O(mnlog(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

引用

页码：217 / 288

页数：72

共 138 条

[1] Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].

Achlioptas, D .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687

[2] Fast computation of low-rank matrix approximations [J].

Achlioptas, Dimitris ;

McSherry, Frank .

JOURNAL OF THE ACM, 2007, 54 (02)

[3]

Ailon N, 2008, PROCEEDINGS OF THE NINETEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1

[4]

Ailon Nir, 2006, P 38 ANN ACM S THEOR, P557, DOI [DOI 10.1145/1132516.1132597, 10.1145/1132516.1132597]

[5]

Alon N., 1999, Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P10, DOI 10.1145/303976.303978

[6]

Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823

[7]

[Anonymous], 2002, Multiscale and Multiresolution Methods

[8]

[Anonymous], ICML COLT UAI SPARS

[9]

[Anonymous], 2009, THESIS YALE U NEW HA

[10]

Arora S, 2006, LECT NOTES COMPUT SC, V4110, P272

← 1 2 3 4 5 6 7 8 9 10 →