Speech recognition by machines and humans

被引:293
作者
Lippmann, RP
机构
[1] Lincoln Laboratory MIT, Lexington, MA 02173-9108
关键词
speech recognition; speech perception; speech; perception; automatic speech recognition; machine recognition; performance; noise; nonsense syllables; nonsense sentences;
D O I
10.1016/S0167-6393(97)00021-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modem speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, wideband, read speech. Machine performance degrades further below that of humans in noise, with channel variability, and for spontaneous speech. Humans can also recognize quiet, clearly spoken nonsense syllables and nonsense sentences with little high-level grammatical information. These comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech. (C) 1997 Elsevier Science B.V.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 44 条
[1]  
[Anonymous], P 1995 ARPA HUM LANG
[2]  
[Anonymous], 1989, Automatic speech recognition: The development of the SPHINX system
[3]  
[Anonymous], P DARPA SPEECH NAT L
[4]   Towards increasing speech recognition error rates [J].
Bourlard, H ;
Hermansky, H ;
Morgan, N .
SPEECH COMMUNICATION, 1996, 18 (03) :205-231
[5]   COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].
BROWN, GJ ;
COOKE, M .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336
[6]  
CHANG E, 1996, P IEEE INT C AC SPEE, P526
[7]  
CHOU W, 1994, P INT C SPOK LANG PR
[8]  
COLE R, 1990, P INT JOINT C NEUR N, V2, P45
[9]  
CULHANE C, 1996, P DARPA SPEECH REC W, P143
[10]  
DALY N, 1987, THESIS MIT