Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition

被引:53
作者
Deng, L [1 ]
Droppo, J [1 ]
Acero, A [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2003年 / 11卷 / 06期
关键词
D O I
10.1109/TSA.2003.818076
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe a novel algorithm for recursive estimation of nonstationary acoustic noise which corrupts clean speech, and a successful application of the algorithm in the speech feature enhancement framework of noise-normalized SPLICE for robust speech recognition. The noise estimation algorithm makes use of a nonlinear model of the acoustic environment in the cepstral domain. Central to the algorithm is the innovative iterative stochastic approximation technique that improves piecewise linear approximation to the nonlinearity involved and that subsequently increases the accuracy for noise estimation. We report comprehensive experiments on SPLICE-based, noise-robust speech recognition for the AURORA2 task using the results of iterative stochastic approximation. The effectiveness of the new technique is demonstrated in comparison with a more traditional, MMSE noise estimation algorithm under otherwise identical conditions. The word error rate reduction achieved by iterative stochastic approximation for recursive noise estimation in the framework of noise-normalized SPLICE is 27.9% for the multicondition training mode, and 67.4% for the clean-only training mode, respectively, compared with the results using the standard cepstra with no speech enhancement and using the baseline HMM supplied by AURORA2. These represent the best performance in the clean-training category of the September-2001 AURORA2 evaluation. The relative error rate reduction achieved by using the same noise estimate is increased to 48.40% and 76.86%, respectively, for the two training modes after using a better designed HMM system. The experimental results demonstrated the crucial importance of using the newly introduced iterations in improving the earlier stochastic approximation technique, and showed sensitivity of the noise estimation algorithm's performance to the forgetting factor embedded in the algorithm.
引用
收藏
页码:568 / 580
页数:13
相关论文
共 20 条
[1]  
Acero A., 2000, INTERSPEECH, P869, DOI DOI 10.1016/S0167-6393(03)00016-5
[2]  
AFIFY M, 2001, P ICASSP, V1, P229
[3]  
Benveniste Albert, 1990, Applications of Mathematics (New York), V22
[4]   Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics [J].
Deng, L ;
Ma, J .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 108 (06) :3036-3048
[5]  
Deng L, 2001, INT CONF ACOUST SPEE, P301, DOI 10.1109/ICASSP.2001.940827
[6]  
Deng L., 2000, P ANN C INT SPEECH C, P806
[7]  
DROPPO J, 2001, P EUR, P217
[8]  
DROPPO J, 2001, P ICASSP APR, V1, P209
[9]   STATISTICAL-MODEL-BASED SPEECH ENHANCEMENT SYSTEMS [J].
EPHRAIM, Y .
PROCEEDINGS OF THE IEEE, 1992, 80 (10) :1526-1555
[10]   Recursive expectation-maximization (EM) algorithms for time-varying parameters with applications to multiple target tracking [J].
Frenkel, L ;
Feder, M .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1999, 47 (02) :306-320