Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems

被引:527
作者
Vrabie, Draguna [1 ]
Lewis, Frank [1 ]
机构
[1] Univ Texas Arlington, Automat & Robot Res Inst, Ft Worth, TX 76118 USA
基金
美国国家科学基金会;
关键词
Direct adaptive optimal control; Policy iteration; Neural networks; Online control;
D O I
10.1016/j.neunet.2009.03.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present in a continuous-time framework an online approach to direct adaptive optimal control with infinite horizon cost for nonlinear systems. The algorithm converges online to the optimal control solution without knowledge of the internal system dynamics. Closed-loop dynamic stability is guaranteed throughout. The algorithm is based oil a reinforcement learning scheme, namely Policy iterations, and makes use of neural networks, in an Actor/Critic structure, to parametrically represent the control policy and the performance of the control system. The two neural networks are trained to express the optimal controller and optimal cost function which describes the infinite horizon control performance. Convergence of the algorithm is proven under the realistic assumption that the two neural networks do not provide perfect representations for the nonlinear control and cost functions. The result is a hybrid control structure which involves a continuous-time controller and a Supervisory adaptation structure which operates based on data sampled from the plant and from the continuous-time performance dynamics. Such control structure is unlike any standard form of controllers previously seen in the literature. Simulation results, obtained considering two second-order nonlinear systems, are provided. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:237 / 246
页数:10
相关论文
共 34 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]   Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation [J].
Abu-Khalaf, Murad ;
Lewis, Frank L. ;
Huang, Jie .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (12) :1989-1995
[3]  
[Anonymous], IEEE WORLD C COMP IN
[4]  
[Anonymous], 2004, IEEE T AUTOMAT CONTR, DOI DOI 10.1109/TAC.1972.1100008
[5]  
[Anonymous], IEEE P CDC 89
[6]  
[Anonymous], 1989, LEARNING DELAYED REW
[7]   Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].
Beard, RW ;
Saridis, GN ;
Wen, JT .
AUTOMATICA, 1997, 33 (12) :2159-2177
[8]  
Bertsekas Dimitri, 1996, Neuro dynamic programming
[9]   Coordinated machine learning and decision support for situation awareness [J].
Brannon, N. G. ;
Seiffertt, J. E. ;
Draelos, T. J. ;
Il, D. C. Wunsch .
NEURAL NETWORKS, 2009, 22 (03) :316-325
[10]   Reinforcement learning in continuous time and space [J].
Doya, K .
NEURAL COMPUTATION, 2000, 12 (01) :219-245