共 14 条
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Q -learning[J] Christopher J. C. H. Watkins;Peter Dayan Machine Learning 1992,
[8]
Learning to Predict by the Methods of Temporal Differences[J] Richard S. Sutton Machine Learning 1988,
[9]
End-to-end navigation strategy with deep reinforcement learning for mobile robots SHI H;SHI L;XU M; et al; IEEE Transactions on Industrial Informatics 2020,
[10]
Counterfactual multi-agent policy gradients FOERSTER J;FARQUHAR G;AFOURAS T; et al; The 32nd AAAI Conferenceon Artificial Intelligence 2018,

