Text-independent speaker verification for real fast-varying noisy environments

被引：7

作者：

Ganchev T. ^{[1
]}

Potamitis I. ^{[1
]}

Fakotakis N. ^{[1
]}

Kokkinakis G. ^{[1
]}

机构：

[1] Wire Communications Laboratory, University of Patras

来源：

International Journal of Speech Technology | 2004年 / 7卷 / 04期

关键词：

Feature extraction; Speaker recognition; Speaker verification;

D O I：

10.1023/B:IJST.0000037072.36778.9e

中图分类号：

学科分类号：

摘要：

Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for -5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.

引用

页码：281 / 292

页数：11

共 24 条

[1]

Assaleh K.T., Mammone R.J., Robust cepstral feature for speaker identification, Proceedings of the IEEE ICASSP'94, 1, pp. 129-132, (1994)

[2]

Beaufays F., Weintraub M., Model transformation for robust speaker recognition from telephone data, Proceedings of the IEEE ICASSP'97, 2, pp. 1063-1066, (1997)

[3]

Berouti M., Schwartz R., Makhoul J., Enhancement of speech corrupted by acoustic noise, Proceedings of the IEEE ICASSP'79, 1, pp. 208-211, (1979)

[4]

Boll S.F., Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics Speech and Signal Processing, 27, pp. 113-120, (1979)

[5]

Chatzi I., Fakotakis N., Kokkinakis G., Greek speech database for creation of voice driven teleservices, Proceedings of the EUROSPEECH'97, 4, pp. 1755-1758, (1997)

[6]

Demuth H., Beale M., Neural Networks Toolbox, User's Guide. Version 3, (1998)

[7]

Drygajlo A., El-Maliki M., Speaker verification in noisy environments with combined spectral subtraction and missing feature theory, Proceedings of the IEEE ICASSP'98, 1, pp. 121-124, (1998)

[8]

Ganchev T., Fakotakis N., Kokkinakis G., Text-independent speaker verification based on probabilistic neural networks, Proceedings of the Acoustics 2002, pp. 159-166, (2002)

[9]

Ganchev T., Fakotakis N., Kokkinakis G., A speaker verification system based on probabilistic neural networks, 2002 NIST Speaker Recognition Evaluation, Results CD Workshop Presentations & Final Release of Results, (2002)

[10]

Gish H., Ng K., Rohlicek J.R., Robust mapping of noisy speech parameters for HMM word spotting, Proceedings of the IEEE ICASSP'92, 2, pp. 109-112, (1992)

← 1 2 3 →