Parameter-efficient fine-tuning of large-scale pre-trained language models

被引：215

作者：

Ding, Ning ^{[1
,2
]}

Qin, Yujia ^{[1
,2
]}

Yang, Guang ^{[1
]}

Wei, Fuchao ^{[1
]}

Yang, Zonghan ^{[1
]}

Su, Yusheng ^{[1
,2
]}

Hu, Shengding ^{[1
,2
]}

Chen, Yulin ^{[3
]}

Chan, Chi-Min ^{[1
]}

Chen, Weize ^{[1
,2
]}

Yi, Jing ^{[1
,2
]}

Zhao, Weilin ^{[1
,2
]}

Wang, Xiaozhi ^{[1
]}

Liu, Zhiyuan ^{[1
,2
]}

Zheng, Hai-Tao ^{[3
]}

Chen, Jianfei ^{[1
]}

Liu, Yang ^{[1
]}

Tang, Jie ^{[1
,2
]}

Li, Juanzi ^{[1
]}

Sun, Maosong ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

[2] Beijing Acad Artificial Intelligence, Beijing, Peoples R China

[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China

来源：

NATURE MACHINE INTELLIGENCE | 2023年 / 5卷 / 03期

基金：

中国国家自然科学基金;

关键词：

All Open Access; Hybrid Gold;

D O I：

10.1038/s42256-023-00626-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the prevalence of pre-trained language models (PLMs) and the pre-training-fine-tuning paradigm, it has been continuously shown that larger models tend to yield better performance. However, as PLMs scale up, fine-tuning and storing all the parameters is prohibitively costly and eventually becomes practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, which optimizes a small portion of the model parameters while keeping the rest fixed, drastically cutting down computation and storage costs. In general, it demonstrates that large-scale models could be effectively stimulated by the optimization of a few parameters. Despite the various designs, here we discuss and analyse the approaches under a more consistent and accessible term 'delta-tuning', where 'delta' a mathematical notation often used to denote changes, is borrowed to refer to the portion of parameters that are 'changed' during training. We formally describe the problem and propose a unified categorization criterion for existing delta-tuning methods to explore their correlations and differences. We also discuss the theoretical principles underlying the effectiveness of delta-tuning and interpret them from the perspectives of optimization and optimal control. Furthermore, we provide a holistic empirical study on over 100 natural language processing tasks and investigate various aspects of delta-tuning. With comprehensive study and analysis, our research demonstrates the theoretical and practical properties of delta-tuning in the adaptation of PLMs. Training a deep neural network can be costly but training time is reduced when a pre-trained network can be adapted to different use cases. Ideally, only a small number of parameters needs to be changed in this process of fine-tuning, which can then be more easily distributed. In this Analysis, different methods of fine-tuning with only a small number of parameters are compared on a large set of natural language processing tasks.

引用

页码：220 / +

页数：25

共 52 条

[1] Aghajanyan A., 2021, PROC ACLIJCNLP, P7319
[2] PID control system analysis, design, and technology
Ang, KH
Chong, G
Li, Y
[J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2005, 13 (04) : 559 - 576
[3] Ben-Zaken E, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, P1
[4] Bengio Y, 2001, ADV NEUR IN, V13, P932
[5] Boyd S. P., 1991, LINEAR CONTROLLER DE
[6] Brown TB, 2020, ADV NEUR IN, V33
[7] Chowdhery A, 2022, Arxiv, DOI arXiv:2204.02311
[8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9] Ding N, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P105
[10] Gao TY, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3816

← 1 2 3 4 5 6 →