A survey of transformers

被引：365

作者：

Lin, Tianyang

Wang, Yuxin

Liu, Xiangyang

Qiu, Xipeng ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China

来源：

AI OPEN | 2022年 / 3卷

关键词：

Transformer; Self-attention; Pre-trained models; Deep learning; ATTENTION;

D O I：

10.1016/j.aiopen.2022.10.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre -training, and applications. Finally, we outline some potential directions for future research.

引用

页码：111 / 132

页数：22

共 173 条

[1]

Ainslie J, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P268

[2]

Al-Rfou R, 2019, AAAI CONF ARTIF INTE, P3159

[3]

[Anonymous], 2018, P 13 C ASS MACHINE T

[4]

[Anonymous], 2018, Generating wikipedia by summarizing long sequences

[5]

Arnab A., 2021, arXiv

[6]

Ba JL., 2016, ARXIV

[7]

Bachlechner T, 2020, Arxiv, DOI [arXiv:2003.04887, DOI 10.48550/ARXIV.2003.04887]

[8]

Baevski A., 2019, INT C LEARN REPR

[9]

Bapna A, 2020, Arxiv, DOI arXiv:2002.07106

[10]

Bapna A, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3028

← 1 2 3 4 5 6 7 8 9 10 →