基于Q-学习的进化博弈决策模型

被引：3

作者：

刘伟兵 ^{[1
]}

黎民 ^{[1
]}

王先甲 ^{[2
]}

机构：

[1] 武汉大学政治与公共管理学院

[2] 武汉大学系统工程研究所

来源：

武汉大学学报(工学版) | 2008年 / 04期

关键词：

进化博弈; 强化学习; Q-学习; 决策模型;

D O I：

暂无

中图分类号：

F224.32 [博弈论];

学科分类号：

摘要：

基于Q-强化学习算法,建立了进化博弈中代理人的决策模型.考虑到强化学习算法不需要建立环境模型,可用于不完全、不确定信息问题,将Q-强化学习算法引入到进化博弈中,研究了进化博弈中两种Q-学习决策模型:单代理人Q-学习决策模型和多代理人Q-学习决策模型,并针对不同结构的进化博弈选择不同的决策模型和算法进行了讨论.仿真算例的结果说明基于Q-学习的决策模型能指导代理人学习、选择最优策略.

引用

页码：122 / 125

页数：4

共 8 条

[1] Learning from Delayed Rewards. Watkins C. . 1989
[2] Ants can play prisoner’s dilemma. Thlol Y,Acan A. 2003 IEEE International Conference on Sys-tems,Man and Cybernetics . 2003
[3] Nash-Qlearning for general-sumstochastic games. Hu,Wellman M P. Journal of Machine LearningResearch . 2003
[4] Another approach to mutation and learning. Amir M,Berninghaus S K. Games and Economic Behavior . 1996
[5] Markov games as a framework formulti-agent reinforcement learning. Littman M L. Proceedingsof the Eleventh International Conference on MachineLearning . 1994
[6] Emergence of cooperation and evolutionary stability in finite populations. Martin Nowak,et al. Nature . 2004
[7] Genetic algorithms and evolution-ary games. Yao X,Darwen P. Commerce,Complexity and E-volution . 2000
[8] Evolution and the Theory of Games. Smith J M. . 1982

← 1 →