COMPUTATIONAL AUDITORY SCENE ANALYSIS

被引:290
作者
BROWN, GJ
COOKE, M
机构
[1] Department of Computer Science, University of Sheffield, Sheffield, S1 4DP, Regent Court
关键词
D O I
10.1006/csla.1994.1016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the ability of human listeners to perceptually segregate concurrent sounds is well documented in the literature, there have been few attempts to exploit this research in the design of computational systems for sound source segregation. In this paper, we present a segregation system that is consistent with psychological and physiological findings. The system is able to segregate speech from a variety of intrusive sounds, including other speech, with some success. The segregation system consists of four stages. Firstly, the auditory periphery is modelled by a bank of bandpass filters and a simulation of neuromechanical transduction by inner hair cells. In the second stage of the system, periodicities, frequency transitions, onsets and offsets in auditory nerve firing patterns are made explicit by separate auditory representations. The representations, auditory maps, are based on the known topographical organization of the higher auditory pathways. Information from the auditory maps is used to construct a symbolic description of the auditory scene. Specifically, the acoustic input is characterized as a collection of time-frequency elements, each of which describes the movement of a spectral peak in time and frequency. In the final stage of the system, a search strategy is employed which groups elements according to the similarity of their fundamental frequencies, onset times and offset times. Following the search, a waveform can be resynthesized from a group of elements so that segregation performance may be assessed by informal listening tests. The system has been evaluated using a database of voiced speech mixed with a variety of intrusive noises such as music, ''office'' noise and other speech. A technique for quantitative evaluation of the system is described, in which the signal-to-noise ratio (SNR) is compared before and after the segregation process. After segregation, an increase in SNR is obtained for each noise condition. Additionally, the performance of our system is significantly better than that of the frame-based segregation scheme described by Meddis and Hewitt (1992).
引用
收藏
页码:297 / 336
页数:40
相关论文
共 59 条
[1]  
[Anonymous], P ICASSP
[2]   MODELING THE PERCEPTION OF CONCURRENT VOWELS - VOWELS WITH DIFFERENT FUNDAMENTAL FREQUENCIES [J].
ASSMANN, PF ;
SUMMERFIELD, Q .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 88 (02) :680-697
[3]   A COMPUTER-MODEL OF AUDITORY STREAM SEGREGATION [J].
BEAUVOIS, MW ;
MEDDIS, R .
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY SECTION A-HUMAN EXPERIMENTAL PSYCHOLOGY, 1991, 43 (03) :517-541
[4]  
Beet S. W., 1990, Computer Speech and Language, V4, P17, DOI 10.1016/0885-2308(90)90021-W
[5]  
BOER E, 1968, IEEE T BIOMED ENG, V15, P169
[6]  
BOER E, 1978, J ACOUST SOC AM, V63, P115
[7]  
BREGMAN A. S., 1990, AUDITORY SCENE ANAL
[8]  
BROWN GJ, 1992, P I ACOUSTICS, V14, P439
[9]  
BROWN GJ, 1992, THESIS U SHEFFIELD
[10]  
CARLSON R, 1982, REPRESENTATION SPEEC