基于深度学习的语音识别技术现状与展望

被引:68
作者
戴礼荣
张仕良
黄智颖
机构
[1] 中国科学技术大学语音与语言信息处理国家工程实验室
基金
国家重点研发计划;
关键词
深度学习; 深度神经网络; 语音识别; 说话人自适应;
D O I
10.16337/j.1004-9037.2017.02.002
中图分类号
TN912.34 [语音识别与设备];
学科分类号
摘要
首先对深度学习的发展历史以及概念进行简要的介绍。然后回顾最近几年基于深度学习的语音识别的研究进展。这一部分内容主要分成以下5点进行介绍:声学模型训练准则,基于深度学习的声学模型结构,基于深度学习的声学模型训练效率优化,基于深度学习的声学模型说话人自适应和基于深度学习的端到端语音识别。最后就基于深度学习的语音识别未来可能的研究方向进行展望。
引用
收藏
页码:221 / 231
页数:11
相关论文
共 11 条
  • [1] Nonrecurrent Neural Structure for Long-Term Dependence
    Zhang, Shiliang
    Liu, Cong
    Jiang, Hui
    Wei, Si
    Dai, Lirong
    Hu, Yu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 871 - 884
  • [2] Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
    Qian, Yanmin
    Bi, Mengxiao
    Tan, Tian
    Yu, Kai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2263 - 2276
  • [3] Speaker adaptive training of deep neural network acoustic models using i-vectors[J] . Yajie Miao,Hao Zhang,Florian Metze.IEEE/ACM Transactions on Audio, Speech and Langua . 2015 (11)
  • [4] Deep Convolutional Neural Networks for Large-scale Speech Tasks[J] . Tara N. Sainath,Brian Kingsbury,George Saon,Hagen Soltau,Abdel-rahman Mohamed,George Dahl,Bhuvana Ramabhadran.Neural Networks . 2015
  • [5] Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition
    Xue, Shaofei
    Abdel-Hamid, Ossama
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1713 - 1725
  • [6] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
  • [7] Linear hidden transformations for adaptation of hybrid ANN/HMM models[J] . Roberto Gemello,Franco Mana,Stefano Scanzio,Pietro Laface,Renato De Mori.Speech Communication . 2006 (10)
  • [8] Long short-term memory
    Hochreiter, S
    Schmidhuber, J
    [J]. NEURAL COMPUTATION, 1997, 9 (08) : 1735 - 1780
  • [9] Improving deep neural networks for LVCSR using rectified linear units and dropout .2 Dahl G E,Sainath T N,Hinton G E. IEEE International Conference on Acoustics,Speech and Signal Processing . 2013
  • [10] Rapid adaptation for deep neural networks through multi-task learning .2 Z. Huang,J. Li,S. M. Siniscalchi,I.-F. Chen,J. Wu,C.-H. Lee. INTERSPEECH . 2015