共 37 条
[21]
MAHADEVAN S, 1996, P 13 INT C MACH LEAR, P202
[22]
MAHADEVAN S, P 11 INT FLAIRS C, P372
[23]
Puterman M.L., 2008, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics
[24]
A STOCHASTIC APPROXIMATION METHOD
[J].
ANNALS OF MATHEMATICAL STATISTICS,
1951, 22 (03)
:400-407
[25]
Ross S. M., 1983, STOCHASTIC PROCESSES
[26]
Sethi SP, 1994, HIERARCHICAL DECISIO
[27]
SINGH S, 1996, NEURAL INFORMATION P
[28]
Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1023/A:1022633531479
[29]
Sutton R.S., 1990, P 7 INT C MACHINE LE, P216
[30]
Sutton R.S., 1984, Temporal Credit Assignment in Reinforcement Learning