An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers

被引:445
作者
Jensen, Jesper [1 ,2 ]
Taal, Cees H. [3 ]
机构
[1] Aalborg Univ, DK-9100 Aalborg, Denmark
[2] Oticon AS, DK-2765 Smorum, Denmark
[3] Quby Labs, NL-1096 CJ Amsterdam, Netherlands
关键词
Modulated noise sources; noise reduction; objective distortion measures; speech enhancement; speech intelligibility prediction; ENVELOPE POWER RATIO; TEMPORAL ENVELOPE; PERCEPTION; SENTENCES; MASKING;
D O I
10.1109/TASLP.2016.2585878
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Intelligibility listening tests are necessary during development and evaluation of speech processing algorithms, despite the fact that they are expensive and time consuming. In this paper, we propose a monaural intelligibility prediction algorithm, which has the potential of replacing some of these listening tests. The proposed algorithm shows similarities to the short-time objective intelligibility (STOI) algorithm, but works for a larger range of input signals. In contrast to STOI, extended STOI (ESTOI) does not assume mutual independence between frequency bands. ESTOI also incorporates spectral correlation by comparing complete 400-ms length spectrograms of the noisy/processed speech and the clean speech signals. As a consequence, ESTOI is also able to accurately predict the intelligibility of speech contaminated by temporally highly modulated noise sources in addition to noisy signals processed with time-frequency weighting. We show that ESTOI can be interpreted in terms of an orthogonal decomposition of short-time spectrograms into intelligibility subspaces, i.e., a ranking of spectrogram features according to their importance to intelligibility. A free MATLAB implementation of the algorithm is available for noncommercial use at http://kom.aau.dk/similar to jje/.
引用
收藏
页码:2009 / 2022
页数:14
相关论文
共 46 条
[1]  
[Anonymous], 1969, S35 ANSI
[2]  
[Anonymous], 1995, S35 ANSI
[3]  
[Anonymous], 1990, TIM AC PHON CONT SPE
[4]  
[Anonymous], P INTERSPEECH
[5]   STANDARDIZATION OF A TEST OF SPEECH-PERCEPTION IN NOISE [J].
BILGER, RC ;
NUETZEL, JM ;
RABINOWITZ, WM ;
RZECZKOWSKI, C .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1984, 27 (01) :32-48
[6]  
Boldt Jesper B., 2009, 2009 17th European Signal Processing Conference (EUSIPCO 2009), P1849
[7]   Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].
Brungart, Douglas S. ;
Chang, Peter S. ;
Simpson, Brian D. ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018
[8]   A glimpsing model of speech perception in noise [J].
Cooke, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) :1562-1573
[9]  
Dreschler WA, 2001, AUDIOLOGY, V40, P148
[10]   TEMPORAL ENVELOPE AND FINE-STRUCTURE CUES FOR SPEECH-INTELLIGIBILITY [J].
DRULLMAN, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 97 (01) :585-592