AN ALGORITHM FOR THE PRINCIPAL COMPONENT ANALYSIS OF LARGE DATA SETS

被引:167
作者
Halko, Nathan [1 ]
Martinsson, Per-Gunnar [1 ]
Shkolnisky, Yoel [2 ]
Tygert, Mark [3 ]
机构
[1] Univ Colorado, Dept Appl Math, Boulder, CO 80309 USA
[2] Tel Aviv Univ, Sch Math Sci, Dept Appl Math, IL-69978 Tel Aviv, Israel
[3] NYU, Courant Inst Math Sci, New York, NY 10012 USA
基金
美国国家科学基金会; 以色列科学基金会;
关键词
algorithm; principal component analysis; PCA; singular value decomposition; SVD; low rank; FACE-RECOGNITION ALGORITHMS; RANDOMIZED ALGORITHM; MATRICES; APPROXIMATION;
D O I
10.1137/100804139
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy-even on parallel processors-unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.
引用
收藏
页码:2580 / 2594
页数:15
相关论文
共 15 条
[1]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[2]  
2-9
[3]   ESTIMATING EXTREMAL EIGENVALUES AND CONDITION NUMBERS OF MATRICES [J].
DIXON, JD .
SIAM JOURNAL ON NUMERICAL ANALYSIS, 1983, 20 (04) :812-814
[4]  
Frank J., 2006, Three-dimensional electron microscopy of macromolecular assemblies: visualization of biological molecules in their native state
[5]  
Golub G. H., 1996, MATRIX COMPUTATIONS
[6]   Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions [J].
Halko, N. ;
Martinsson, P. G. ;
Tropp, J. A. .
SIAM REVIEW, 2011, 53 (02) :217-288
[7]   ESTIMATING THE LARGEST EIGENVALUE BY THE POWER AND LANCZOS ALGORITHMS WITH A RANDOM START [J].
KUCZYNSKI, J ;
WOZNIAKOWSKI, H .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1992, 13 (04) :1094-1122
[8]   Randomized algorithms for the low-rank approximation of matrices [J].
Liberty, Edo ;
Woolfe, Franco ;
Martinsson, Per-Gunnar ;
Rolchlin, Vladimir ;
Tyger, Mark .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (51) :20167-20172
[9]  
Martinsson P.-G., 2011, P NEUR INF PROC SYST
[10]   Computational and performance aspects of PCA-based face-recognition algorithms [J].
Moon, H ;
Phillips, PJ .
PERCEPTION, 2001, 30 (03) :303-321