A comprehensive survey of multiagent reinforcement learning

被引：1342

作者：

Busoniu, Lucian ^{[1
]}

Babuska, Robert ^{[1
]}

De Schutter, Bart ^{[1
,2
]}

机构：

[1] Delft Univ Technol, Fac Mech Engn, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands

[2] Delft Univ Technol, Marine & Transport Technol Dept, NL-2628 CD Delft, Netherlands

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2008年 / 38卷 / 02期

关键词：

distributed control; game theory; multiagent systems; reinforcement learning;

D O I：

10.1109/TSMCC.2007.913919

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim-either explicitly or implicitly-at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

引用

页码：156 / 172

页数：17

共 131 条

[61]

KONONEN V, P 4 INT C INT DAT EN, P68

[62] Least-squares policy iteration [J].