Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

被引：2505

作者：

Halko, N. ^{[1
]}

Martinsson, P. G. ^{[1
]}

Tropp, J. A. ^{[2
]}

机构：

[1] Univ Colorado, Dept Appl Math, Boulder, CO 80309 USA

[2] CALTECH, Pasadena, CA 91125 USA

来源：

SIAM REVIEW | 2011年 / 53卷 / 02期

基金：

美国国家科学基金会;

关键词：

dimension reduction; eigenvalue decomposition; interpolative decomposition; Johnson-Lindenstrauss lemma; matrix approximation; parallel algorithm; pass-efficient algorithm; principal component analysis; randomized algorithm; random matrix; rank-revealing QR factorization; singular value decomposition; streaming algorithm;

D O I：

10.1137/090771806

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed-either explicitly or implicitly-to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m x n matrix. (i) For a dense input matrix, randomized algorithms require O(mnlog(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

引用

页码：217 / 288

页数：72

共 138 条

[91]

Lindenstrauss W. J. J., 1984, Contemp. Math, V26, P189

[92]

Mahoney MW, 2009, P NATL ACAD SCI USA, V106, P697, DOI [10.1073/pnas.0803205106, 10.1073/pnas.0803205105]

[93]

Martinsson P.-G., 2008, ID: A software package for low-rank approximation of matrices via interpolative decompositions

[94] A randomized algorithm for the decomposition of matrices [J].

Martinsson, Per-Gunnar ;

Rokhlin, Vladimir ;

Tygert, Mark .

APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2011, 30 (01) :47-68

[95]

MARTINSSON PG, 2010, NORMALIZED POWER ITE

[96]

Matousek Jiri, 2002, Lectures on Discrete Geometry, V212, DOI DOI 10.1007/978-1-4613-0039-7

[97]

MCSHERRY F, 2004, THESIS U WASHINGTON

[98] THE MONTE CARLO METHOD [J].

METROPOLIS, N ;

ULAM, S .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1949, 44 (247) :335-341

[99]

Milman VD., 1971, FUNCT ANAL APPL, V5, P28

[100]

MIRSKY L., 1960, The quarterly journal of mathematics, V11, P50, DOI [DOI 10.1093/QMATH/11.1.50, 10.1093/qmath/11.1.50]

← 5 6 7 8 9 10 11 12 13 14 →