Low-Rank and Locality Constrained Self-Attention for Sequence Modeling

被引:22
作者
Guo, Qipeng [1 ,2 ]
Qiu, Xipeng [1 ,2 ]
Xue, Xiangyang [1 ,2 ]
Zhang, Zheng [3 ,4 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] NYU Shanghai, Shanghai 200122, Peoples R China
[4] AWS Shanghai AI Lab, Shanghai 200000, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse matrices; Bit error rate; Matrix decomposition; Linguistics; Task analysis; Natural language processing; Data models; Sequence modeling; self-attention; transformer; deep learning;
D O I
10.1109/TASLP.2019.2944078
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Self-attention mechanism becomes more and more popular in natural language processing (NLP) applications. Recent studies show the Transformer architecture which relies mainly on the attention mechanism achieves much success on large datasets. But a raised problem is its generalization ability is weaker than CNN and RNN on many moderate-sized datasets. We think the reason can be attributed to its unsuitable inductive bias of the self-attention structure. In this paper, we regard the self-attention as matrix decomposition problem and propose an improved self-attention module by introducing two linguistic constraints: low-rank and locality. We further develop the low-rank attention and band attention to parameterize the self-attention mechanism under the low-rank and locality constraints. Experiments on several real NLP tasks show our model outperforms the vanilla Transformer and other self-attention models on moderate size datasets. Additionally, evaluation on a synthetic task gives us a more detailed understanding of working mechanisms of different architectures.
引用
收藏
页码:2213 / 2222
页数:10
相关论文
共 43 条
  • [1] [Anonymous], P CLIN ORTH REL RES
  • [2] [Anonymous], 2017, RepEval
  • [3] [Anonymous], P N AM CHAPT ASS COM
  • [4] [Anonymous], P CLIN ORTH REL RES
  • [5] [Anonymous], P CLIN ORTH REL RES
  • [6] [Anonymous], P ANN C N AM CHAPT A
  • [7] [Anonymous], INTRO LINGUISTICS NA
  • [8] [Anonymous], P CLIN ORTH REL RES
  • [9] [Anonymous], 2017, P 2017 C EMP METH NA
  • [10] [Anonymous], P CLIN ORTH REL RES