Separation of speech from interfering sounds based on oscillatory correlation

被引:181
作者
Wang, DLL [1 ]
Brown, GJ
机构
[1] Ohio State Univ, Dept Comp & Informat Sci, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
[3] Univ Sheffield, Dept Comp Sci, Sheffield S8 0ET, S Yorkshire, England
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1999年 / 10卷 / 03期
基金
美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
auditory scene analysis; harmonicity; oscillatory correlation; speech segregation; stream segregation;
D O I
10.1109/72.761727
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A multistage neural model is proposed for an auditory scene analysis task-segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized relaxation oscillators, each of which corresponds to an auditory feature, and different streams are represented by desynchronized oscillator populations. Lateral connections between oscillators encode harmonicity, and proximity in frequency and time. Prior to the oscillator network are a model of the auditory periphery and a stage in which mid-level auditory representations are formed. The model has been systematically evaluated using: a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-noise ratio for every mixture. The performance of our model is compared with other studies on computational auditory scene analysis. A number of issues including: biological plausibility and real-time implementation are also discussed.
引用
收藏
页码:684 / 697
页数:14
相关论文
共 44 条
[1]  
[Anonymous], COMPUTATIONAL AUDITO
[2]  
[Anonymous], J ACOUST SOC AM
[3]  
[Anonymous], COMPUTATIONAL AUDITO
[4]  
[Anonymous], P ICASSP
[5]   MODELING THE PERCEPTION OF CONCURRENT VOWELS - VOWELS WITH DIFFERENT FUNDAMENTAL FREQUENCIES [J].
ASSMANN, PF ;
SUMMERFIELD, Q .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 88 (02) :680-697
[6]   Thalamic modulation of high-frequency oscillating potentials in auditory cortex [J].
Barth, DS ;
MacDonald, KD .
NATURE, 1996, 383 (6595) :78-81
[7]   Computer simulation of auditory stream segregation in alternating-tone sequences [J].
Beauvois, MW ;
Meddis, R .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (04) :2270-2280
[8]  
Beet S. W., 1990, Computer Speech and Language, V4, P17, DOI 10.1016/0885-2308(90)90021-W
[9]   INTONATION AND THE PERCEPTUAL SEPARATION OF SIMULTANEOUS VOICES [J].
BROKX, JPL ;
NOOTEBOOM, SG .
JOURNAL OF PHONETICS, 1982, 10 (01) :23-36
[10]   Modelling the perceptual segregation of double vowels with a network of neural oscillators [J].
Brown, GJ ;
Wang, D .
NEURAL NETWORKS, 1997, 10 (09) :1547-1558