An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers

被引：445

作者：

Jensen, Jesper ^{[1
,2
]}

Taal, Cees H. ^{[3
]}

机构：

[1] Aalborg Univ, DK-9100 Aalborg, Denmark

[2] Oticon AS, DK-2765 Smorum, Denmark

[3] Quby Labs, NL-1096 CJ Amsterdam, Netherlands

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2016年 / 24卷 / 11期

关键词：

Modulated noise sources; noise reduction; objective distortion measures; speech enhancement; speech intelligibility prediction; ENVELOPE POWER RATIO; TEMPORAL ENVELOPE; PERCEPTION; SENTENCES; MASKING;

D O I：

10.1109/TASLP.2016.2585878

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Intelligibility listening tests are necessary during development and evaluation of speech processing algorithms, despite the fact that they are expensive and time consuming. In this paper, we propose a monaural intelligibility prediction algorithm, which has the potential of replacing some of these listening tests. The proposed algorithm shows similarities to the short-time objective intelligibility (STOI) algorithm, but works for a larger range of input signals. In contrast to STOI, extended STOI (ESTOI) does not assume mutual independence between frequency bands. ESTOI also incorporates spectral correlation by comparing complete 400-ms length spectrograms of the noisy/processed speech and the clean speech signals. As a consequence, ESTOI is also able to accurately predict the intelligibility of speech contaminated by temporally highly modulated noise sources in addition to noisy signals processed with time-frequency weighting. We show that ESTOI can be interpreted in terms of an orthogonal decomposition of short-time spectrograms into intelligibility subspaces, i.e., a ranking of spectrogram features according to their importance to intelligibility. A free MATLAB implementation of the algorithm is available for noncommercial use at http://kom.aau.dk/similar to jje/.

引用

页码：2009 / 2022

页数：14

共 46 条

[1]

[Anonymous], 1969, S35 ANSI

[2]

[Anonymous], 1995, S35 ANSI

[3]

[Anonymous], 1990, TIM AC PHON CONT SPE

[4]

[Anonymous], P INTERSPEECH