Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition

被引:175
作者
Drineas, Petros [1 ]
Kannan, Ravi
Mahoney, Michael W.
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[3] Yale Univ, Dept Math, New Haven, CT 06520 USA
关键词
randomized algorithms; Monte Carlo methods; massive data sets; CUR matrix decomposition;
D O I
10.1137/S0097539704442702
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In many applications, the data consist of ( or may be naturally formulated as) an m x n matrix A which may be stored on disk but which is too large to be read into random access memory ( RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an m x n matrix A, compute approximations to A which are the product of three smaller matrices, C, U, and R, each of which may be computed rapidly. Let A' = CUR be the computed approximate decomposition; both algorithms have provable bounds for the error matrix A - A'. In the first algorithm, c columns of A and r rows of A are randomly chosen. If the m x c matrix C consists of those c columns of A ( after appropriate rescaling) and the r x n matrix R consists of those r rows of A ( also after appropriate rescaling), then the c x r matrix U may be calculated from C and R. For any matrix X, let parallel to X parallel to(F) and parallel to X parallel to(2) denote its Frobenius norm and its spectral norm, respectively. It is proven that parallel to A - A'parallel to(xi) <= min (D: rank( D) <= k) parallel to A - D parallel to(xi) + poly(k, 1/c)parallel to A parallel to(F) holds in expectation and with high probability for both xi = 2, F and for all k = 1,..., rank(A); thus by appropriate choice of k parallel to A - A'parallel to(2) <=epsilon parallel to A parallel to(F) also holds in expectation and with high probability. This algorithm may be implemented without storing the matrix A in RAM, provided it can make two passes over the matrix stored in external memory and use O( m + n) additional RAM ( assuming that c and r are constants, independent of the size of the input). The second algorithm is similar except that it approximates the matrix C by randomly sampling a constant number of rows of C. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error ( beyond the best rank-k approximation) that is at most epsilon parallel to A parallel to(F), both algorithms take time which is a low-degree polynomial in k, 1/epsilon, and 1/delta, where delta > 0 is a failure probability; the. rst takes time linear in max(m, n) and the second takes time independent of m and n. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.
引用
收藏
页码:184 / 206
页数:23
相关论文
共 29 条
[1]  
ACHLIOPTAS D, IN PRESS J ACM
[2]  
Achlioptas D., 2001, PROC 33 ACM S THEORY, P611
[3]  
[Anonymous], 2002, THESIS U CALIFORNIA
[4]  
Bar-Yossef Z., 2003, Proceedings of the 35th Annual ACM Symposium on the Theory of Computing, P335, DOI DOI 10.1145/780542.780593
[5]  
BERRY MW, 2004, TR200432 UMIACS
[6]  
Bhatia R., 1997, MATRIX ANAL
[7]   Approximating matrix multiplication for pattern recognition tasks [J].
Cohen, E ;
Lewis, DD .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 1999, 30 (02) :211-252
[8]  
Drineas P, 2005, J MACH LEARN RES, V6, P2153
[9]  
Drineas P, 1999, PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P291
[10]   Approximating a gram matrix for improved kernel-based learning (Extended abstract) [J].
Drineas, P ;
Mahoney, MW .
LEARNING THEORY, PROCEEDINGS, 2005, 3559 :323-337