The auditory organization of speech and other sources in listeners and computational models

被引:87
作者
Cooke, M [1 ]
Ellis, DPW
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
[2] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
关键词
auditory scene analysis; speech perception; streaming; auditory induction; double vowels; robust ASR;
D O I
10.1016/S0167-6393(00)00078-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech is typically perceived against a background of other sounds. Listeners are adept at extracting target sources from the acoustic mixture reaching the ears. The auditory scene analysis (ASA) account holds that this feat is the result of a two-stage process. In the first-stage, sound is decomposed into collections of fragments in several dimensions. Subsequent processes of perceptual organization reassemble these fragments, based on cues indicating common source of origin which are interpreted in the light of prior experience. In this way, the decomposed auditory scene is processed to extract coherent evidence for one or more sources. Auditory scene analysis in listeners has been studied for several decades and recent years have seen a steady accumulation of computational models of perceptual organization. The purpose of this review is to describe the evidence for the nature of auditory organization in listeners and to explore the computational models which have been motivated by such evidence. The primary focus is on speech rather than on sources such as polyphonic music or non-speech ambient backgrounds, although all these domains are equally amenable to auditory organization. The review includes a discussion of the relationship between auditory scene analysis and alternative approaches to sound source segregation. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:141 / 177
页数:37
相关论文
共 164 条
[1]  
Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.1121/1.408434, DOI 10.7551/MITPRESS/1486.001.0001]
[2]  
[Anonymous], THESIS STANFORD U
[3]  
[Anonymous], 2012, ROBUSTNESS AUTOMATIC
[4]   ADAPTATION TO AUDITORY STREAMING OF FREQUENCY-MODULATED TONES [J].
ANSTIS, S ;
SAIDA, S .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1985, 11 (03) :257-271
[5]   MODELING THE PERCEPTION OF CONCURRENT VOWELS - VOWELS WITH DIFFERENT FUNDAMENTAL FREQUENCIES [J].
ASSMANN, PF ;
SUMMERFIELD, Q .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 88 (02) :680-697
[6]   THE CONTRIBUTION OF WAVE-FORM INTERACTIONS TO THE PERCEPTION OF CONCURRENT VOWELS [J].
ASSMANN, PF ;
SUMMERFIELD, Q .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (01) :471-484
[7]  
ASSMANN PF, IN PRESS AUDITORY BA
[8]  
BAILEY PJ, 1977, SR5152 HASK LABS
[9]   A COMPUTER-MODEL OF AUDITORY STREAM SEGREGATION [J].
BEAUVOIS, MW ;
MEDDIS, R .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 1991, 43 (03) :517-541
[10]   Computer simulation of auditory stream segregation in alternating-tone sequences [J].
Beauvois, MW ;
Meddis, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (04) :2270-2280