ADEQUACY OF AUDITORY MODELS TO PREDICT HUMAN INTERNAL REPRESENTATION OF SPEECH SOUNDS

被引:17
作者
GHITZA, O
机构
[1] AT&T Bell Laboratories, Acoustics Research Department, Murray Hill
关键词
D O I
10.1121/1.406679
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A long-standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model representation can describe the actual human internal representation. Here, this question is addressed in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT). In the psychophysical procedure, the subject has all the cognitive information needed for the discrimination task, a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in the DRT database. The recognizer used [Ghitza and Sondhi, J. Acoust. Soc. Am. Suppl. 1 87, S107 (1990)] was designed to keep errors due to the recognition procedure to a minimum, so that the overall detected errors are due mainly to inaccuracies in the front-end representation. The system provides detailed diagnostics that show the error distributions among six phonetically distinctive features. Results are given for the behavior of two speech analysis methods, a representation based on the auditory system and one based on the Fourier power spectrum, in quiet and in a noisy environment. These results are compared with psychophysical results for the same database.
引用
收藏
页码:2160 / 2171
页数:12
相关论文
共 20 条
[1]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[3]  
Ghitza O., 1986, Computer Speech and Language, V1, P109, DOI 10.1016/S0885-2308(86)80018-3
[4]  
GHITZA O, 1993, IN PRESS COMPUT SPEE
[5]  
Ghitza O., 1992, ADV SPEECH SIGNAL PR, P453
[6]  
GHITZA O, 1990, J ACOUST SOC AM S1, V87, pS107
[7]   MODELING RAPID WAVE-FORM COMPRESSION ON THE BASILAR-MEMBRANE AS MULTIPLE-BANDPASS-NONLINEARITY FILTERING [J].
GOLDSTEIN, JL .
HEARING RESEARCH, 1990, 49 (1-3) :39-60
[8]   A COCHLEAR FREQUENCY-POSITION FUNCTION FOR SEVERAL SPECIES - 29 YEARS LATER [J].
GREENWOOD, DD .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (06) :2592-2605
[9]  
JAKOBSON R, 1952, MIT13 AC LAB TECH RE
[10]  
Juang B. H., 1991, Computer Speech and Language, V5, P275, DOI 10.1016/0885-2308(91)90011-E