再励学习——原理、算法及其在智能控制中的应用

被引：11

作者：

阎平凡

机构：

[1] 清华大学自动化系北京

来源：

关键词：

再励学习; 学习控制; 智能控制;

D O I：

10.13976/j.cnki.xk.1996.01.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

综述了再励学习(Reinforcement Learning)的原理,主要算法,基于神经网络的实现及其在智能控制中的作用,探讨了应进一步研究的问题.

引用

页码：28 / 34+43 +43

页数：8

共 24 条

[1]

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Jaakkola T,Jordan M,Singh S. Neural Computation . 1994

[2]

Reinforcement Learning for the Adaptive Control of Nonlinear Systems. Albert Y Zomaya. IEEE Transactions on Systems Man and Cybernetics . 1994

[3]

Associative Reinforcement Learning: Functions in k -DNF[J] . Leslie Pack Kaelbling. &nbspMachine Learning . 1994 (3)

[4]

Associative Reinforcement Learning: A Generate and Test Algorithm[J] . Leslie Pack Kaelbling. &nbspMachine Learning . 1994 (3)

[5]

Technical Note: Q-Learning[J] . Christopher J.C.H. Watkins,Peter Dayan. &nbspMachine Learning . 1992 (3)

[6]

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning[J] . Ronald J. Williams. &nbspMachine Learning . 1992 (3)

[7]

Learning to predict by the methods of temporal differences[J] . Richard S. Sutton. &nbspMachine Learning . 1988 (1)

[8]

Simple Statistical Gradient-Following Algorithm for Connectionist Reinforcement Learning. Williams R J. Machine Learning . 1982

[9]

Technical Note,Q-Learning. Watkins J C H. Machine Learning . 1992

[10]

Learning by the Method of Temporal Differences. Sutton R S. Machine Learning . 1988