共 25 条
[1]
Barto A. G.(1995)Learning to act using real-time dynamic programming Artificial Intelligence 72 81-138
[2]
Bradtke S. J.(1992)The convergence of TD(λ) for general λ Machine Learning 8 341-362
[3]
Singh S.(1996)Exploration bonuses and dual control Machine Learning 25 5-22
[4]
Dayan P.(1994)On the convergence of stochastic iterative dynamic programming algorithms Neural Computation 6 1185-1201
[5]
Dayan P.(1996)Reinforcement learning: A survey Journal of Artificial Intelligence Research 4 237-285
[6]
Sejnowski T. J.(1997)Reinforcement learning for dynamic channel allocation in cellular telephone systems Advances in neural information processing systems 9 974-980
[7]
Jaakkola T.(1996)Reinforcement learning with replacing eligibility traces Machine Learning 22 123-158
[8]
Jordan M. I.(1994)An upper bound on the loss from approximate optimal-value functions Machine Learning 16 227-233
[9]
Singh S.(1988)Learning to predict by the method of temporal differences Machine Learning 3 9-44
[10]
Kaelbling L. P.(1995)Temporal difference learning and TD-gammon Communications of the ACM 38 58-68