共 4 条
[3]
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
[J].
Machine Learning,
2000, 38
:287-308
[4]
Christopher J.C.H. Watkins,Peter Dayan.Technical Note Q-Learning[J].Machine Learning,1992