How Do Humans Process and Recognize Speech?

被引:196
作者
Allen, Jont B. [1 ]
机构
[1] AT&T Bell Labs, Acoust Res Dept, Murray Hill, NJ 07974 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1994年 / 2卷 / 04期
关键词
D O I
10.1109/89.326615
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Until the performance of automatic speech recognition (ASR) hardware surpasses human performance in accuracy and robustness, we stand to gain by understanding the basic principles behind human speech recognition (HSR). This problem was studied exhaustively at Bell Labs between the years of 1918 and 1950 by Harvey Fletcher and his colleagues. The motivation for these studies was to quantify the quality of speech sounds in the telephone plant to both improve speech intelligibility and preference. To do this he and his group studied the effects of filtering and noise on speech recognition accuracy for nonsense consonant-vowel-consonant (CVC) syllables, words, and sentences. Fletcher used the term "articulation" as the probability of correct recognition for nonsense sounds, and "intelligibility" as the probability of correction recognition for words (sounds having meaning). In 1919, Fletcher found a way to transform articulation data for filtered speech into an additive density function D(f) and found a formula that accurately predicts the average articulation. The area under D(f) is called the "articulation index." Fletcher then went on to find relationships between the recognition errors for the nonsense speech sounds, words, and sentences. This work has recently been reviewed and partially replicated by Boothroyd and by Bronkhorst, et al. Taken as a whole, these studies tell us a great deal about how humans process and recognize speech sounds.
引用
收藏
页码:567 / 577
页数:11
相关论文
共 20 条
[1]  
ALLEN JB, 1994, ASA REPRINT SPEECH H
[2]  
[Anonymous], 1990, AUDITORY SCENE ANAL, DOI [DOI 10.1121/1.408434, 10.1121/1.408434]
[3]  
Boothroyd A., 1993, ACOUSTICAL FACTORS A, V2, P277
[4]  
BOOTHROYD A, 1968, J ACOUST SOC AM, V43
[5]  
BOOTHROYD A, 1988, J ACOUST SOC AM, V84
[6]  
BRAIDA LD, 1993, SPEECH COMMUN, P1
[7]  
BRAIDA LD, 1991, Q J EXPER PSYCHOL A, V43
[8]   A MODEL FOR CONTEXT EFFECTS IN SPEECH RECOGNITION [J].
BRONKHORST, AW ;
BOSMAN, AJ ;
SMOORENBURG, GF .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 93 (01) :499-509
[9]  
BRONKHORST AW, 1938, J ACOUST SOC AM, V9, P275
[10]  
BRONKHORST AW, 1994, ASA EDITION SPEECH H