共 24 条
[1]
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Jaakkola T,Jordan M,Singh S. Neural Computation . 1994
[2]
Reinforcement Learning for the Adaptive Control of Nonlinear Systems. Albert Y Zomaya. IEEE Transactions on Systems Man and Cybernetics . 1994
[3]
Associative Reinforcement Learning: Functions in k -DNF[J] . Leslie Pack Kaelbling.  Machine Learning . 1994 (3)
[4]
Associative Reinforcement Learning: A Generate and Test Algorithm[J] . Leslie Pack Kaelbling.  Machine Learning . 1994 (3)
[5]
Technical Note: Q-Learning[J] . Christopher J.C.H. Watkins,Peter Dayan.  Machine Learning . 1992 (3)
[6]
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning[J] . Ronald J. Williams.  Machine Learning . 1992 (3)
[7]
Learning to predict by the methods of temporal differences[J] . Richard S. Sutton.  Machine Learning . 1988 (1)
[8]
Simple Statistical Gradient-Following Algorithm for Connectionist Reinforcement Learning. Williams R J. Machine Learning . 1982
[9]
Technical Note,Q-Learning. Watkins J C H. Machine Learning . 1992
[10]
Learning by the Method of Temporal Differences. Sutton R S. Machine Learning . 1988