AN OVERVIEW OF THE SPHINX SPEECH RECOGNITION SYSTEM

被引:166
作者
LEE, KF
HON, HW
REDDY, R
机构
[1] School of Computer Science, Carnegie Mellon University, Pittsburgh
来源
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING | 1990年 / 38卷 / 01期
基金
美国国家科学基金会;
关键词
D O I
10.1109/29.45616
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker independence, continuous speech, and large vocabularies pose three of the greatest challenges in automatic speech recognition. Previously, accurate speech recognizers avoided dealing simultaneously with all three problems. This paper describes SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMM's) with LPC-derived parameters. To provide speaker independence, we added knowledge to these HMM's in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, we introduce two new subword speech units—function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96 percent on a 997-word task. © 1990 IEEE
引用
收藏
页码:35 / 45
页数:11
相关论文
共 48 条
[1]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[2]  
BAHL LR, 1988, APR IEEE INT C AC SP
[3]  
BAHL LR, 1979, APR IEEE INT C AC SP
[4]   DRAGON SYSTEM - OVERVIEW [J].
BAKER, JK .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :24-29
[5]  
Baum L. E., 1972, INEQUALITIES, V3, P1
[6]  
BROWN PF, 1987, THESIS CARNEGIE MELL
[7]  
CHOW YL, 1987, APR P IEEE INT C AC, P89
[8]  
CHOW YL, 1986, APR IEEE INT C AC SP
[9]  
COLE RA, 1983, OCT IEEE INT C AC SP
[10]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366