Approximating a gram matrix for improved kernel-based learning (Extended abstract)

被引:23
作者
Drineas, P [1 ]
Mahoney, MW
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] Yale Univ, Dept Math, New Haven, CT 06520 USA
来源
LEARNING THEORY, PROCEEDINGS | 2005年 / 3559卷
关键词
D O I
10.1007/11503415_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A problem for many kernel-based methods is that the amount of;computation required to find the solution scales as O(n(3)), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n X n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G(k) = (CWk+CT), where C is a matrix consisting of a small number c of columns of G and W-k is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let parallel to (.) parallel to and parallel to (.) parallel to(F) denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let Gk be the best rank-k approximation to G. We prove that by choosing O(k/epsilon(4)) columns parallel to G-(CWk+CT)parallel to(xi) <= parallel to(xi) <= parallel to G-G(k)parallel to(xi) + epsilon (i=1)Sigma(n) G(ii)(2), both in expectation and with high probability, for both xi = 2, F, and for all k : 0 <= k <= rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage.
引用
收藏
页码:323 / 337
页数:15
相关论文
共 33 条
[1]  
Achlioptas D, 2002, ADV NEUR IN, V14, P335
[2]  
Achlioptas D., 2001, PROC 33 ACM S THEORY, P611
[3]  
[Anonymous], 2004, P 21 INT C MACH LEAR
[4]  
[Anonymous], P 13 INT C MACH LEAR
[5]  
[Anonymous], 1985, COMPUTATIONAL METHOD
[6]  
Azar Y., 2001, P 33 ANN ACM S THEOR, P619, DOI [10.1145/380752.380859, DOI 10.1145/380752.380859]
[7]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[8]  
Bengio Y, 2004, ADV NEUR IN, V16, P177
[9]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[10]   Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data [J].
Donoho, DL ;
Grimes, C .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (10) :5591-5596