Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition

被引:19
作者
Kim, D. K. [1 ]
Gales, M. J. F. [2 ]
机构
[1] Chonnam Natl Univ, Dept Elect & Comp Engn, Kwangju 500757, South Korea
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 02期
关键词
Adaptive training; noise robustness; speaker adaptation; speech recognition; MODELS;
D O I
10.1109/TASL.2010.2047756
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adaptive training is a widely used technique for building speech recognition systems on nonhomogeneous training data. Recently, there has been interest in applying these approaches for situations where there is significant levels of background noise in the training data. Various schemes for adaptive training are based on noise-, or speaker-, specific transforms of features to yield estimates of the clean speech. However, when there are high levels of background noise, these clean speech estimates may be poor resulting in degradations in performance. In this paper, a new approach for adaptive training on noise-corrupted training data is presented. It extends a popular form of linear transform for model-based adaptation and adaptive training, constrained MLLR (CMLLR), to reflect additional uncertainty from noise-corrupted observations. This new form of adaptation transform is called noisy CMLLR (NCMLLR). NCMLLR uses a modified version of generative model between clean speech and noisy observation, similar to factor analysis (FA). However, in contrast to FA here the generative model describes an adaptation transform, rather than a covariance matrix structure. The use of NCMLLR for adaptive training using an expectation-maximization approach is described. Discriminative adaptive training with NCMLLR is also described based on the minimum phone error criterion. Experimental results comparing NCMLLR with standard adaptive training schemes are given on a noise-corrupted version of Resource Management, the ARPA 1994 CSRNAB Spoke 10 task, and in-car recorded data.
引用
收藏
页码:315 / 325
页数:11
相关论文
共 36 条
[1]  
Acero A., 2000, P ICSLP BEIJ CHIN OC
[2]  
[Anonymous], 1996, THESIS CARNEGIE MELL
[3]  
[Anonymous], 2000, P ANN C INT SPEECH C
[4]  
[Anonymous], 1996, P ICSLP
[5]  
Bahl L. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P49
[6]  
Gales M.J., 1995, THESIS CAMBRIDGE U C
[7]   Predictive linear transforms for noise robust speech recognition [J].
Gales, M. J. F. ;
van Dalen, R. C. .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :59-64
[8]  
Gales M. J. F., 1998, COMPUT SPEECH LA JAN, V12
[9]   Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains [J].
Gauvain, Jean-Luc ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :291-298
[10]  
Gopinath R.A., 1998, Proceedings International Conference on Speech and Language Processing, P397