FAST EXACT MULTIPLICATION BY THE HESSIAN

被引:324
作者
PEARLMUTTER, BA
机构
关键词
D O I
10.1162/neco.1994.6.1.147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Just storing the Hessian H (the matrix of second derivatives partial derivative(2)E/partial derivative w(i) partial derivative w(j) of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator R(v){f(W)} = (partial derivative/partial derivative r) f(w + rv)/(r=o), note that R(v){del(w)} = Hv and R(v){w} = v, and then apply R(v){.} to the equations used to compute del(w). The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.
引用
收藏
页码:147 / 160
页数:14
相关论文
共 31 条
[1]  
ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
[2]  
ALMEIDA LB, 1987, 1ST P IEEE INT C NEU, P609
[3]  
ALSPECTOR J, 1993, ADV NEURAL INFORMATI, V5, P836
[4]  
BECKER S, 1989, 1988 P CONN MOD SUMM, P29
[5]   EXACT CALCULATION OF THE HESSIAN MATRIX FOR THE MULTILAYER PERCEPTRON [J].
BISHOP, C .
NEURAL COMPUTATION, 1992, 4 (04) :494-501
[6]  
BUNTINE W, 1994, IN PRESS IEEE T NEUR
[7]  
CAUDILL M, 1987, 1ST IEEE INT C NEUR
[8]  
CAUWENBERGHS G, 1993, ADV NEURAL INFORMATI, V5, P244
[9]   AUTOMATIC HESSIANS BY REVERSE ACCUMULATION [J].
CHRISTIANSON, B .
IMA JOURNAL OF NUMERICAL ANALYSIS, 1992, 12 (02) :135-150
[10]  
Cun Y.L., 1993, ADV NEURAL INFORM PR, P156