Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem

被引:151
作者
Ernst, Damien [1 ,2 ]
Glavic, Mevludin [2 ]
Capitanescu, Florin [2 ]
Wehenkel, Louis [2 ]
机构
[1] Belgian Natl Fund Sci Res, B-1000 Brussels, Belgium
[2] Univ Liege, Dept Elect Engn & Comp Sci, B-4000 Liege, Belgium
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2009年 / 39卷 / 02期
关键词
Approximate dynamic programming (ADP); electric power oscillations damping; fitted Q iteration; interior-point method (IPM); model predictive control (NWC); reinforcement learning (RL); tree-based supervised learning (SL); STABILITY; OPTIMIZATION; STRATEGIES;
D O I
10.1109/TSMCB.2008.2007630
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.
引用
收藏
页码:517 / 529
页数:13
相关论文
共 56 条
[1]   Interior point SQP strategies for large-scale, structured process optimization problems [J].
Albuquerque, J ;
Gopal, V ;
Staus, G ;
Biegler, LT ;
Ydstie, BE .
COMPUTERS & CHEMICAL ENGINEERING, 1999, 23 (4-5) :543-554
[2]  
[Anonymous], 1989, POWER ELECT POWER SY
[3]  
Bagnell JA, 2001, IEEE INT CONF ROBOT, P1615, DOI 10.1109/ROBOT.2001.932842
[4]  
Bartlett RA, 2000, P AMER CONTR CONF, P4229, DOI 10.1109/ACC.2000.877018
[5]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[6]  
Bemporad A, 1999, LECT NOTES CONTR INF, V245, P207
[7]  
Bertsekas D P, 2005, P 44 IEEE C DEC CONT, P10, DOI [10.1109/CDC.2005.1582107, DOI 10.1109/CDC.2005.1582107]
[8]  
Bertsekas Dimitri, 1996, Neuro dynamic programming
[9]  
Bertsekas Dimitri P, 2000, Dynamic programming and optimal control, V1
[10]  
Bradtke S. J., 1993, Advances in Neural Information Processing Systems, V5, P295