Adaptive critic designs for discrete-time zero-sum games with application to H∞ control

被引：127

作者：

Al-Tamimi, Asma ^{[1
]}

Abu-Khalaf, Murad ^{[1
]}

Lewis, Frank L. ^{[1
]}

机构：

[1] Univ Texas, Automat & Robot Res Inst, Ft Worth, TX 76118 USA

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2007年 / 37卷 / 01期

基金：

美国国家科学基金会;

关键词：

adaptive critics; approximate dynamic programming (ADP); H-infinity optimal control; policy iteration; zero-sum game;

D O I：

10.1109/TSMCB.2006.880135

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this correspondence, adaptive critic approximate dynamic programming designs are derived to solve the discrete-time zero-sum game in which the state and action spaces are continuous. This results in a forward-in-time reinforcement learning algorithm that converges to the Nash equilibrium of the corresponding zero-sum game. The results in this correspondence can be thought of as a way to solve the Riccati equation of the well-known discrete-time H-infinity optimal control problem forward in time. Two schemes are presented, namely: 1) a heuristic dynamic programming and 2) a dual-heuristic dynamic programming, to solve for the value function and the costate of the game, respectively. An H-infinity autopilot design for an F-16 aircraft is presented to-illustrate the results.

引用

页码：240 / 247

页数：8

共 24 条

[1] Hamilton-Jacobi-Isaacs formulation for constrained input nonlinear systems [J].

Abu-Khalaf, M ;

Lewis, FL ;

Huang, J .

2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, :5034-5040

[2]

[Anonymous], P 8 BELG DUTCH C MAC

[3] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[4]

BASAR T, 1995, HINFINITY OPTIMAL CO

[5]

Basar T., 1998, Dynamic noncooperative game theory

[6]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[7]

BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475

[8] KRONECKER PRODUCTS AND MATRIX CALCULUS IN SYSTEM THEORY [J].

BREWER, JW .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1978, 25 (09) :772-781

[9]

Christopher John Cornish Hellaby Watkins, 1989, LEARNING DELAYED REW

[10]

Howard R.A, 1960, Dynamic Programming and Markov Processes

← 1 2 3 →