ON THE USE OF STOCHASTIC HESSIAN INFORMATION IN OPTIMIZATION METHODS FOR MACHINE LEARNING

被引：157

作者：

Byrd, Richard H. ^{[1
]}

Chin, Gillian M. ^{[2
]}

Neveitt, Will ^{[3
]}

Nocedal, Jorge ^{[4
]}

机构：

[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA

[2] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA

[3] Google Res, Mountain View, CA 94043 USA

[4] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA

来源：

SIAM JOURNAL ON OPTIMIZATION | 2011年 / 21卷 / 03期

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

unconstrained optimization; stochastic optimization; machine learning;

D O I：

10.1137/10079923X

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

This paper describes how to incorporate sampled curvature information in a Newton-CG method and in a limited memory quasi-Newton method for statistical learning. The motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation approach. Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a significantly smaller sample size than is used for the function and gradient. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.

引用

页码：977 / 995

页数：19

共 17 条

[1]

[Anonymous], 2008, Linear and Nonlinear Optimization

[2]

[Anonymous], 2015, Linear and Nonlinear Programming

[3]

[Anonymous], 1987, Unconstrained Optimization: Practical Methods of Optimization

[4]

Bottou L., 1998, ONLINE LEARNING NEUR, V17, P142

[5]

Lin CJ, 2008, J MACH LEARN RES, V9, P627

[6] ON THE LIMITED MEMORY BFGS METHOD FOR LARGE-SCALE OPTIMIZATION [J].

LIU, DC ;

NOCEDAL, J .

MATHEMATICAL PROGRAMMING, 1989, 45 (03) :503-528

[7]

Malouf R., 2002, P C NAT LANG LEARN

[8]

MARTENS J, 2010, P 27 INT C MACH LEAR, V95

[9]

Minka TP., 2003, A comparison of numerical optimizers for logistic regression, P1

[10] NEWTON-TYPE MINIMIZATION VIA THE LANCZOS METHOD [J].

NASH, SG .

SIAM JOURNAL ON NUMERICAL ANALYSIS, 1984, 21 (04) :770-788

← 1 2 →