A survey of transformers

被引:365
作者
Lin, Tianyang
Wang, Yuxin
Liu, Xiangyang
Qiu, Xipeng [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
来源
AI OPEN | 2022年 / 3卷
关键词
Transformer; Self-attention; Pre-trained models; Deep learning; ATTENTION;
D O I
10.1016/j.aiopen.2022.10.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre -training, and applications. Finally, we outline some potential directions for future research.
引用
收藏
页码:111 / 132
页数:22
相关论文
共 173 条
[1]  
Ainslie J, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P268
[2]  
Al-Rfou R, 2019, AAAI CONF ARTIF INTE, P3159
[3]  
[Anonymous], 2018, P 13 C ASS MACHINE T
[4]  
[Anonymous], 2018, Generating wikipedia by summarizing long sequences
[5]  
Arnab A., 2021, arXiv
[6]  
Ba JL., 2016, ARXIV
[7]  
Bachlechner T, 2020, Arxiv, DOI [arXiv:2003.04887, DOI 10.48550/ARXIV.2003.04887]
[8]  
Baevski A., 2019, INT C LEARN REPR
[9]  
Bapna A, 2020, Arxiv, DOI arXiv:2002.07106
[10]  
Bapna A, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3028