ON THE USE OF STOCHASTIC HESSIAN INFORMATION IN OPTIMIZATION METHODS FOR MACHINE LEARNING

被引:157
作者
Byrd, Richard H. [1 ]
Chin, Gillian M. [2 ]
Neveitt, Will [3 ]
Nocedal, Jorge [4 ]
机构
[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[2] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA
[3] Google Res, Mountain View, CA 94043 USA
[4] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
unconstrained optimization; stochastic optimization; machine learning;
D O I
10.1137/10079923X
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
This paper describes how to incorporate sampled curvature information in a Newton-CG method and in a limited memory quasi-Newton method for statistical learning. The motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation approach. Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a significantly smaller sample size than is used for the function and gradient. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.
引用
收藏
页码:977 / 995
页数:19
相关论文
共 17 条
[1]  
[Anonymous], 2008, Linear and Nonlinear Optimization
[2]  
[Anonymous], 2015, Linear and Nonlinear Programming
[3]  
[Anonymous], 1987, Unconstrained Optimization: Practical Methods of Optimization
[4]  
Bottou L., 1998, ONLINE LEARNING NEUR, V17, P142
[5]  
Lin CJ, 2008, J MACH LEARN RES, V9, P627
[6]   ON THE LIMITED MEMORY BFGS METHOD FOR LARGE-SCALE OPTIMIZATION [J].
LIU, DC ;
NOCEDAL, J .
MATHEMATICAL PROGRAMMING, 1989, 45 (03) :503-528
[7]  
Malouf R., 2002, P C NAT LANG LEARN
[8]  
MARTENS J, 2010, P 27 INT C MACH LEAR, V95
[9]  
Minka TP., 2003, A comparison of numerical optimizers for logistic regression, P1
[10]   NEWTON-TYPE MINIMIZATION VIA THE LANCZOS METHOD [J].
NASH, SG .
SIAM JOURNAL ON NUMERICAL ANALYSIS, 1984, 21 (04) :770-788