Nonrecurrent Neural Structure for Long-Term Dependence

被引：31

作者：

Zhang, Shiliang ^{[1
]}

Liu, Cong ^{[2
]}

Jiang, Hui ^{[3
]}

Wei, Si ^{[2
]}

Dai, Lirong ^{[1
]}

Hu, Yu ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Peoples R China

[2] IFLYTEK Res, Hefei 230088, Peoples R China

[3] York Univ, Lassonde Sch Engn, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 04期

关键词：

CFSMN; Deep neural networks; feedforward sequential memory networks; language modeling; speech recognition; NETWORKS; ERROR;

D O I：

10.1109/TASLP.2017.2672398

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose a novel neural network structure, namely feedforward sequential memory networks (FSMN), to model long-term dependence in time series without using recurrent feedback. The proposed FSMN is a standard fully connected feedforward neural network equipped with some learnable memory blocks in its hidden layers. The memory blocks use a tapped-delay line structure to encode the long context information into a fixed-size representation as short-term memory mechanism which are somehow similar to the time-delay neural networks layers. We have evaluated the FSMNs in several standard benchmark tasks, including speech recognition and language modeling. Experimental results have shown that FSMNs outperform the conventional recurrent neural networks (RNN) while can be learned much more reliably and faster in modeling sequential signals like speech or language. Moreover, we also propose a compact feedforward sequential memory networks (cFSMN) by combining FSMN with low-rank matrix factorization and make a slight modification to the encoding method used in FSMNs in order to further simplify the network architecture. On the speech recognition Switchboard task, the proposed cFSMN structures can reduce the model size by 60% and speed up the learning by more than seven times while the model can still significantly outperform the popular bidirectional LSTMs for both frame-level cross-entropy criterion-based training and MMI-based sequence training.

引用

页码：871 / 884

页数：14

共 64 条

[61] BACKPROPAGATION THROUGH TIME - WHAT IT DOES AND HOW TO DO IT
WERBOS, PJ
[J]. PROCEEDINGS OF THE IEEE, 1990, 78 (10) : 1550 - 1560
[62] Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition
Xue, Shaofei
Abdel-Hamid, Ossama
Jiang, Hui
Dai, Lirong
Liu, Qingfeng
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1713 - 1725
[63] Zhang SL, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P495
[64] Zhang SL, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P2635

← 1 2 3 4 5 6 7 →