Human-level play in the game of Diplomacy by combining language models with strategic reasoning

被引:173
作者
Bakhtin, Anton [1 ]
Brown, Noam [1 ]
Dinan, Emily [1 ]
Farina, Gabriele [1 ]
Flaherty, Colin [1 ]
Fried, Daniel [1 ,2 ]
Goff, Andrew [1 ]
Gray, Jonathan [1 ]
Hu, Hengyuan [1 ,3 ]
Jacob, Athul Paul [1 ,4 ]
Komeili, Mojtaba [1 ]
Konath, Karthik [1 ]
Kwon, Minae [1 ,3 ]
Lerer, Adam [1 ]
Lewis, Mike [1 ]
Miller, Alexander H. [1 ]
Mitts, Sasha [1 ]
Renduchintala, Adithya [1 ]
Roller, Stephen [1 ]
Rowe, Dirk [1 ]
Shi, Weiyan [1 ,5 ]
Spisak, Joe [1 ]
Wei, Alexander [1 ,6 ]
Wu, David [1 ]
Zhang, Hugh [1 ,7 ]
Zijlstra, Markus [1 ]
机构
[1] Meta AI, 1 Hacker Way, Menlo Pk, CA 94025 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[3] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[4] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[5] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[6] Univ Calif Berkeley, Dept Comp Sci, Berkeley, CA 94720 USA
[7] Harvard Univ, EconCS Grp, Cambridge, MA 02138 USA
关键词
GO; AI;
D O I
10.1126/science.ade9097
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Despite much progress in training artificial intelligence (AI) systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.
引用
收藏
页码:1067 / +
页数:8
相关论文
共 33 条
[1]
Alzantot M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2890
[2]
Bakhtin A., 2022, P INT C LEARNING REP
[3]
Bakhtin A., 2021, ADV NEURAL INF PROCE, V32, P34
[4]
Berner C., DOTA 2 LARGE SCALE D
[5]
Brown N., 2020, Adv. Neural Inf. Process. Syst., V33, P17057
[6]
Superhuman AI for multiplayer poker [J].
Brown, Noam ;
Sandholm, Tuomas .
SCIENCE, 2019, 365 (6456) :885-+
[7]
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals [J].
Brown, Noam ;
Sandholm, Tuomas .
SCIENCE, 2018, 359 (6374) :418-+
[8]
Brown TB, 2020, ADV NEUR IN, V33
[9]
Deep blue [J].
Campbell, M ;
Hoane, AJ ;
Hsu, FH .
ARTIFICIAL INTELLIGENCE, 2002, 134 (1-2) :57-83
[10]
Charlaix E., Findings of the Association for Computational Lin, VEMNLP 2020, P1