Deep Neural Networks for Acoustic Modeling in Speech Recognition

被引:6963
作者
Hinton, Geoffrey
Deng, Li [1 ,2 ,3 ,4 ]
Yu, Dong [2 ]
Dahl, George E.
Mohamed, Abdel-rahman [5 ]
Jaitly, Navdeep [8 ]
Senior, Andrew
Vanhoucke, Vincent [6 ]
Patrick Nguyen [2 ,7 ]
Sainath, Tara N.
Kingsbury, Brian
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
[2] MSR, Redmond, WA USA
[3] MIT, ATR Interpreting Telecommun Res Labs, Kyoto, Japan
[4] Hong Kong Univ Sci & Technol, Hong Kong, Hong Kong, Peoples R China
[5] Katholieke Univ Leuven, ESAT PSI Speech Grp, Louvain, Belgium
[6] Speech R&D Team ,Nuance, Menlo Pk, CA USA
[7] Panason Speech Technol Lab, Santa Barbara, CA USA
[8] Capr Pharmaceut, Montreal, PQ, Canada
关键词
REPRESENTATIONS; FEATURES; EXPERTS; NETS;
D O I
10.1109/MSP.2012.2205597
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition. © 2012 IEEE.
引用
收藏
页码:82 / 97
页数:16
相关论文
共 68 条
  • [1] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
  • [2] [Anonymous], 2009, NIPS WORKSH DEEP LEA
  • [3] [Anonymous], P ICASSP
  • [4] [Anonymous], 2010, P NIPS WORKSH DEEP L
  • [5] [Anonymous], 2007, IEEE INT C ICML
  • [6] [Anonymous], 2010, ICML
  • [7] [Anonymous], 2012, MOMENTUM
  • [8] [Anonymous], MATH FDN SPEECH LANG
  • [9] [Anonymous], 1999, COMPUTATIONAL MODELS
  • [10] Bahl L. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P49