Text-independent speaker verification for real fast-varying noisy environments

被引:7
作者
Ganchev T. [1 ]
Potamitis I. [1 ]
Fakotakis N. [1 ]
Kokkinakis G. [1 ]
机构
[1] Wire Communications Laboratory, University of Patras
关键词
Feature extraction; Speaker recognition; Speaker verification;
D O I
10.1023/B:IJST.0000037072.36778.9e
中图分类号
学科分类号
摘要
Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for -5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.
引用
收藏
页码:281 / 292
页数:11
相关论文
共 24 条
[1]  
Assaleh K.T., Mammone R.J., Robust cepstral feature for speaker identification, Proceedings of the IEEE ICASSP'94, 1, pp. 129-132, (1994)
[2]  
Beaufays F., Weintraub M., Model transformation for robust speaker recognition from telephone data, Proceedings of the IEEE ICASSP'97, 2, pp. 1063-1066, (1997)
[3]  
Berouti M., Schwartz R., Makhoul J., Enhancement of speech corrupted by acoustic noise, Proceedings of the IEEE ICASSP'79, 1, pp. 208-211, (1979)
[4]  
Boll S.F., Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics Speech and Signal Processing, 27, pp. 113-120, (1979)
[5]  
Chatzi I., Fakotakis N., Kokkinakis G., Greek speech database for creation of voice driven teleservices, Proceedings of the EUROSPEECH'97, 4, pp. 1755-1758, (1997)
[6]  
Demuth H., Beale M., Neural Networks Toolbox, User's Guide. Version 3, (1998)
[7]  
Drygajlo A., El-Maliki M., Speaker verification in noisy environments with combined spectral subtraction and missing feature theory, Proceedings of the IEEE ICASSP'98, 1, pp. 121-124, (1998)
[8]  
Ganchev T., Fakotakis N., Kokkinakis G., Text-independent speaker verification based on probabilistic neural networks, Proceedings of the Acoustics 2002, pp. 159-166, (2002)
[9]  
Ganchev T., Fakotakis N., Kokkinakis G., A speaker verification system based on probabilistic neural networks, 2002 NIST Speaker Recognition Evaluation, Results CD Workshop Presentations & Final Release of Results, (2002)
[10]  
Gish H., Ng K., Rohlicek J.R., Robust mapping of noisy speech parameters for HMM word spotting, Proceedings of the IEEE ICASSP'92, 2, pp. 109-112, (1992)