Modelling speaker intelligibility in noise

被引:72
作者
Barker, Jon [1 ]
Cooke, Martin [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
intelligibility; automatic speech recognition; speech perception; glimpsing; energetic masking; computational modelling;
D O I
10.1016/j.specom.2006.11.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study compared listeners' performance on a multispeaker speech-in-noise task with that of a model inspired by automatic speech recognition techniques. Listeners identified three keywords in simple 6-word sentences presented in speech-shaped noise at a range of signal-to-noise ratios. Sentence material was provided by 18 male or 16 female speakers. An across-speaker analysis of a number of acoustic parameters (vocal tract length, mean fundamental frequency and speaking rate) found none to be consistently good predictors of relative intelligibility. A simple measure of degree of energetic masking was a good predictor of female speech intelligibility, especially in high noise conditions, but failed to account for interspeaker differences for the male group. A glimpsing model, which combined a simulation of energetic masking with speaker-dependent statistical models, produced recognition scores which were fitted to the behavioural data pooled across all speakers. Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords. The fit was particularly good in high noise conditions. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:402 / 417
页数:16
相关论文
共 29 条
[1]   RECOGNITION OF PLOSIVE SYLLABLES IN NOISE - COMPARISON OF AN AUDITORY MODEL WITH HUMAN-PERFORMANCE [J].
AINSWORTH, WA ;
MEYER, GF .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 96 (02) :687-694
[2]  
Boersma P., 2020, Praat: Doing phonetics by computerversion 6.1.27
[3]  
Boersma P., 1993, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, P97, DOI DOI 10.1371/JOURNAL.PONE.0069107
[4]   A NOTE ON THE ACOUSTIC-PHONETIC CHARACTERISTICS OF INADVERTENTLY CLEAR SPEECH [J].
BOND, ZS ;
MOORE, TJ .
SPEECH COMMUNICATION, 1994, 14 (04) :325-337
[5]   Intelligibility of normal speech .1. Global and fine-grained acoustic-phonetic talker characteristics [J].
Bradlow, AR ;
Torretta, GM ;
Pisoni, DB .
SPEECH COMMUNICATION, 1996, 20 (3-4) :255-272
[6]   Informational and energetic masking effects in the perception of multiple simultaneous talkers [J].
Brungart, DS ;
Simpson, BD ;
Ericson, MA ;
Scott, KR .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (05) :2527-2538
[7]   Informational and energetic masking effects in the perception of two simultaneous talkers [J].
Brungart, DS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (03) :1101-1109
[8]  
Burg J. P., 1975, MAXIMUM ENTROPY SPEC
[9]   A glimpsing model of speech perception in noise [J].
Cooke, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) :1562-1573
[10]   Glimpsing speech [J].
Cooke, M .
JOURNAL OF PHONETICS, 2003, 31 (3-4) :579-584