Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof

被引：673

作者：

Al-Tamimi, Asma ^{[1
]}

Lewis, Frank L. ^{[2
]}

Abu-Khalaf, Murad ^{[3
]}

机构：

[1] Hashemite Univ, Zarqa 13115, Jordan

[2] Univ Texas Arlington, Automat & Robot Res Inst, Ft Worth, TX 76118 USA

[3] MathWorks Inc, Natick, MA 01760 USA

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2008年 / 38卷 / 04期

基金：

美国国家科学基金会;

关键词：

adaptive critics; approximate dynamic programming (ADP); Hamilton Jacobi Bellman (HJB); policy iteration; value iteration;

D O I：

10.1109/TSMCB.2008.926614

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT). nonlinear optimal control. It is assumed that, at each iteration, the value. and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.

引用

页码：943 / 949

页数：7

共 39 条

[1] Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems [J].

Abu-Khalaf, M ;

Lewis, FL ;

Huang, J .

2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, :5034-5040

[2] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[3] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[4] Adaptive critic designs for discrete-time zero-sum games with application to H∞ control [J].

Al-Tamimi, Asma ;

Abu-Khalaf, Murad ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (01) :240-247

[5]

[Anonymous], P 8 BELG DUTCH C MAC

[6] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[7]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[8]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

[9]

Cao XR, 2002, IEEE DECIS CONTR P, P3367, DOI 10.1109/CDC.2002.1184395

[10] Generalized Hamilton-Jacobi-Blellman formulation-based neural network control of affine nonlinear discrete-time systems [J].

Chen, Zheng ;

Jagannathan, Sarangapani .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (01) :90-106

← 1 2 3 4 →