Modelling speaker intelligibility in noise

被引：72

作者：

Barker, Jon ^{[1
]}

Cooke, Martin ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England

来源：

SPEECH COMMUNICATION | 2007年 / 49卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

intelligibility; automatic speech recognition; speech perception; glimpsing; energetic masking; computational modelling;

D O I：

10.1016/j.specom.2006.11.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This study compared listeners' performance on a multispeaker speech-in-noise task with that of a model inspired by automatic speech recognition techniques. Listeners identified three keywords in simple 6-word sentences presented in speech-shaped noise at a range of signal-to-noise ratios. Sentence material was provided by 18 male or 16 female speakers. An across-speaker analysis of a number of acoustic parameters (vocal tract length, mean fundamental frequency and speaking rate) found none to be consistently good predictors of relative intelligibility. A simple measure of degree of energetic masking was a good predictor of female speech intelligibility, especially in high noise conditions, but failed to account for interspeaker differences for the male group. A glimpsing model, which combined a simulation of energetic masking with speaker-dependent statistical models, produced recognition scores which were fitted to the behavioural data pooled across all speakers. Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords. The fit was particularly good in high noise conditions. (c) 2006 Elsevier B.V. All rights reserved.

引用

页码：402 / 417

页数：16

共 29 条

[1] RECOGNITION OF PLOSIVE SYLLABLES IN NOISE - COMPARISON OF AN AUDITORY MODEL WITH HUMAN-PERFORMANCE [J].

AINSWORTH, WA ;

MEYER, GF .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 96 (02) :687-694

[2]

Boersma P., 2020, Praat: Doing phonetics by computerversion 6.1.27

[3]

Boersma P., 1993, Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, P97, DOI DOI 10.1371/JOURNAL.PONE.0069107

[4] A NOTE ON THE ACOUSTIC-PHONETIC CHARACTERISTICS OF INADVERTENTLY CLEAR SPEECH [J].

BOND, ZS ;

MOORE, TJ .

SPEECH COMMUNICATION, 1994, 14 (04) :325-337

[5] Intelligibility of normal speech .1. Global and fine-grained acoustic-phonetic talker characteristics [J].

Bradlow, AR ;

Torretta, GM ;

Pisoni, DB .

SPEECH COMMUNICATION, 1996, 20 (3-4) :255-272

[6] Informational and energetic masking effects in the perception of multiple simultaneous talkers [J].

Brungart, DS ;

Simpson, BD ;

Ericson, MA ;

Scott, KR .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (05) :2527-2538

[7] Informational and energetic masking effects in the perception of two simultaneous talkers [J].

Brungart, DS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (03) :1101-1109

[8]

Burg J. P., 1975, MAXIMUM ENTROPY SPEC

[9] A glimpsing model of speech perception in noise [J].

Cooke, M .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) :1562-1573

[10] Glimpsing speech [J].

Cooke, M .

JOURNAL OF PHONETICS, 2003, 31 (3-4) :579-584

← 1 2 3 →