A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation

被引:72
作者
Elhilali, Mounya [1 ]
Shamma, Shihab A. [2 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
[2] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
D O I
10.1121/1.3001672
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sound systems and speech technologies can benefit greatly from a deeper understanding of how the auditory system, and particularly the auditory cortex, is able to parse complex acoustic scenes into meaningful auditory objects and streams under adverse conditions. In the current work, a biologically plausible model of this process is presented, where the role of cortical mechanisms in organizing complex auditory scenes is explored. The model consists of two stages: (i) a feature analysis stage that maps the acoustic input into a multidimensional cortical representation and (ii) an integrative stage that recursively builds up expectations of how streams evolve over time and reconciles its predictions with the incoming sensory input by sorting it into different clusters. This approach yields a robust computational scheme for speaker separation under conditions of speech or music interference. The model can also emulate the archetypal streaming percepts of tonal stimuli that have long been tested in human subjects. The implications of this model are discussed with respect to the physiological correlates of streaming in the cortex as well as the role of attention and other top-down influences in guiding sound organization. (C) 2008 Acoustical Society of America. [DOI: 10.1121/1.3001672]
引用
收藏
页码:3751 / 3771
页数:21
相关论文
共 108 条
[91]   Integration and segregation in auditory scene analysis [J].
Sussman, ES .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 117 (03) :1285-1298
[92]   Spectral processing in the auditory cortex [J].
Sutter, ML .
AUDITORY SPECTRAL PROCESSING, 2005, 70 :253-298
[93]   CROSSING OF AUDITORY STREAMS [J].
TOUGAS, Y ;
BREGMAN, AS .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1985, 11 (06) :788-798
[94]   Processing of low-probability sounds by cortical neurons [J].
Ulanovsky, N ;
Las, L ;
Nelken, I .
NATURE NEUROSCIENCE, 2003, 6 (04) :391-398
[95]  
Van Noorden L.P.A.S., 1975, TEMPORAL COHERENCE P, DOI 10.6100/IR152538
[96]  
VARGA AP, 1990, INT CONF ACOUST SPEE, P845, DOI 10.1109/ICASSP.1990.115970
[97]   TEMPORAL-MODULATION TRANSFER-FUNCTIONS BASED UPON MODULATION THRESHOLDS [J].
VIEMEISTER, NF .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 66 (05) :1364-1380
[98]   A NEURAL COCKTAIL-PARTY PROCESSOR [J].
VONDERMALSBURG, C ;
SCHNEIDER, W .
BIOLOGICAL CYBERNETICS, 1986, 54 (01) :29-40
[99]   Separation of speech from interfering sounds based on oscillatory correlation [J].
Wang, DLL ;
Brown, GJ .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (03) :684-697
[100]   SPECTRAL SHAPE-ANALYSIS IN THE CENTRAL AUDITORY-SYSTEM [J].
WANG, KS ;
SHAMMA, SA .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :382-395