Approximating a gram matrix for improved kernel-based learning (Extended abstract)

被引：23

作者：

Drineas, P ^{[1
]}

Mahoney, MW

机构：

[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA

[2] Yale Univ, Dept Math, New Haven, CT 06520 USA

来源：

LEARNING THEORY, PROCEEDINGS | 2005年 / 3559卷

关键词：

D O I：

10.1007/11503415_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A problem for many kernel-based methods is that the amount of;computation required to find the solution scales as O(n(3)), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n X n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G(k) = (CWk+CT), where C is a matrix consisting of a small number c of columns of G and W-k is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let parallel to (.) parallel to and parallel to (.) parallel to(F) denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let Gk be the best rank-k approximation to G. We prove that by choosing O(k/epsilon(4)) columns parallel to G-(CWk+CT)parallel to(xi) <= parallel to(xi) <= parallel to G-G(k)parallel to(xi) + epsilon (i=1)Sigma(n) G(ii)(2), both in expectation and with high probability, for both xi = 2, F, and for all k : 0 <= k <= rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage.

引用

页码：323 / 337

页数：15

共 33 条

[1]

Achlioptas D, 2002, ADV NEUR IN, V14, P335

[2]

Achlioptas D., 2001, PROC 33 ACM S THEORY, P611

[3]

[Anonymous], 2004, P 21 INT C MACH LEAR

[4]

[Anonymous], P 13 INT C MACH LEAR

[5]

[Anonymous], 1985, COMPUTATIONAL METHOD

[6]

Azar Y., 2001, P 33 ANN ACM S THEOR, P619, DOI [10.1145/380752.380859, DOI 10.1145/380752.380859]

[7] Laplacian eigenmaps for dimensionality reduction and data representation [J].

Belkin, M ;

Niyogi, P .

NEURAL COMPUTATION, 2003, 15 (06) :1373-1396

[8]

Bengio Y, 2004, ADV NEUR IN, V16, P177

[9]

Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389

[10] Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data [J].

Donoho, DL ;

Grimes, C .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (10) :5591-5596

← 1 2 3 4 →