Convolutional Neural Networks for Speech Recognition

被引:1563
作者
Abdel-Hamid, Ossama [1 ]
Mohamed, Abdel-Rahman [2 ]
Jiang, Hui [1 ]
Deng, Li [3 ]
Penn, Gerald [2 ]
Yu, Dong [3 ]
机构
[1] York Univ, Lassonde Sch Engn, Dept Elect Engn & Comp Sci, Toronto, ON M3J 1P3, Canada
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S, Canada
[3] Microsoft Res, Redmond, WA 98052 USA
关键词
Convolution; convolutional neural networks; Limited Weight Sharing (LWS) scheme; pooling; CONNECTIONIST FEATURE-EXTRACTION; MODEL; FEATURES;
D O I
10.1109/TASLP.2014.2339736
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, the hybrid deep neural network (DNN)hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.
引用
收藏
页码:1533 / 1545
页数:13
相关论文
共 46 条
  • [1] Abdel-Hamid O., 2013, P INTERSPEECH
  • [2] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
  • [3] [Anonymous], 2009, NIPS WORKSH DEEP LEA
  • [4] [Anonymous], 2010, P NIPS WORKSH DEEP L
  • [5] [Anonymous], 2013, P INT C LEARN REPR
  • [6] [Anonymous], P 9 ANN C COGN SCI S
  • [7] [Anonymous], 2009, P 26 ANN INT C MACHI, DOI DOI 10.1145/1553374.1553453
  • [8] [Anonymous], 2003, HDB BRAIN THEORY NEU
  • [9] [Anonymous], 2010, P INTERSPEECH
  • [10] [Anonymous], P ISCSLP