语音识别中深度神经网络目标值优化

被引:4
作者
陈梦喆
张晴晴
潘接林
颜永红
机构
[1] 中国科学院语言声学与内容理解重点实验室
关键词
语音识别; 深度神经网络; 前后向算法; 目标值优化;
D O I
10.15961/j.jsuese.2016.01.025
中图分类号
TN912.34 [语音识别与设备];
学科分类号
0711 ;
摘要
训练深度神经网络声学模型时,所采用的强制对齐得到的目标值存在无法精准地表示出语音实际状况的问题。针对这一问题,提出一种利用前后向算法得到非0-1分布目标值的方法。由于用于强制对齐的模型可能与处理语句不完全匹配,以及发音连续性导致的过渡边界难以分离等问题,强制对齐得到的目标值存在不合理性。新的目标值可以表示某一帧以一定概率属于邻近各状态的分布情况,更详细地描述建模单元之间的过渡,进一步还原语音的原貌,提升模型的鲁棒性。同时,为寻求模型鲁棒性和建模单元区分度之间的平衡,对算法得到的目标值进行加窗处理。在中文客服问答领域进行实验,在小数据量上验证了目标值对于训练的较大影响,并且选取窗长宽度这一参数。最后将训练数据量提升至60、80以及100 h,结果显示,新的目标值优化方法训练得到的模型在识别性能上获得提升,相对字错误率下降为1.10%3.65%。多组实验验证新的目标值优化方法对模型训练有一定效果,在训练数据量上升的情况下依然具有有效性。
引用
收藏
页码:166 / 172
页数:7
相关论文
共 20 条
[11]  
Estimation of global posteriors and forwardbackward training of hybrid HMM/ANN systems. Hennebert, J,Ris, C.,Bourlard, H.,Renals, S.,& Morgan, N. Proceedings of the European Conference on Speech Communication and Technology . 1997
[12]  
Deep learning of feature representation with multiple instance learning for medical image analysis. Xu Y,Mo T,Feng Q.W.et al. 2014 IEEE International Conference on Acoustics,Speech and Signal Processing . 2014
[13]  
Learning representations by back-propagating errors. Rumelhart D E,Hinton G E,Williams R J. Nature . 1986
[14]  
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization. Kingsbury B,Sainath T N,Soltau H. Proceedings of 13th Annual Conference of the International Speech Communication Association (INTERSPEECH) . 2012
[15]  
The role of coarticulatory effects in the perception of fricatives by children and adults. Nittrouer S,Studdert-Kennedy M. Journal of Speech and Hearing Research . 1987
[16]  
Connectionist probability estimators in HMM speech recognition. Renals, S.,Morgan, N.,Bourlard, H.,Cohen, M.,Franco, H. Speech and Audio Processing, IEEE Transactions on . 1994
[17]  
Spoken Language Processing. Xuedong Huang,Alex Acero,Hsiao-Wuen Hon. . 2001
[18]  
Perceptual linear predictive (PLP) analysis of speech. Hermansky H. The Journal of The Acoustical Society of America . 1990
[19]  
Understanding speech recognition using correlation-generated neural network targets. Yonghong Yan. Speech and Audio Processing, IEEE Transactions on . 1999
[20]  
Acoustic Modeling Using Deep Belief Networks. Abdel-rahman Mohamed,George E. Dahl,Geoffrey Hinton. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING . 2012