Speech recognition by machines and humans

被引:293
作者
Lippmann, RP
机构
[1] Lincoln Laboratory MIT, Lexington, MA 02173-9108
关键词
speech recognition; speech perception; speech; perception; automatic speech recognition; machine recognition; performance; noise; nonsense syllables; nonsense sentences;
D O I
10.1016/S0167-6393(97)00021-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modem speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, wideband, read speech. Machine performance degrades further below that of humans in noise, with channel variability, and for spontaneous speech. Humans can also recognize quiet, clearly spoken nonsense syllables and nonsense sentences with little high-level grammatical information. These comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech. (C) 1997 Elsevier Science B.V.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 44 条
[21]  
Kubala F., 1995, P ARPA SPOK LANG TEC, P41
[22]  
*LDC, 1995, SWITCHBOARD US MAN
[23]  
LEONARD RG, 1984, P IEEE INT C AC SPEE
[24]   EFFECTS OF DIFFERENTIATION, INTEGRATION, AND INFINITE PEAK CLIPPING UPON THE INTELLIGIBILITY OF SPEECH [J].
LICKLIDER, JCR ;
POLLACK, I .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1948, 20 (01) :42-51
[25]   STUDY OF MULTICHANNEL AMPLITUDE COMPRESSION AND LINEAR AMPLIFICATION FOR PERSONS WITH SENSORINEURAL HEARING-LOSS [J].
LIPPMANN, RP ;
BRAIDA, LD ;
DURLACH, NI .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1981, 69 (02) :524-534
[26]   Accurate consonant perception without mid-frequency speech energy [J].
Lippmann, RP .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (01) :66-69
[27]  
LIPPMANN RP, 1987, 1987 P IEEE INT C AC, P705
[28]  
Liu JY, 1996, P TECH AS P, P157
[29]  
Martin A., 1996, COMMUNICATION
[30]  
Miller G.A., 1991, SCI WORDS