AUTOMATIC RECOGNITION OF KEYWORDS IN UNCONSTRAINED SPEECH USING HIDDEN MARKOV-MODELS

被引:195
作者
WILPON, JG
RABINER, LR
LEE, CH
GOLDMAN, ER
机构
[1] AT&T Bell Laboratories, Murray Hill
来源
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING | 1990年 / 38卷 / 11期
关键词
D O I
10.1109/29.103088
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker independent recognition of small vocabularies, spoken over the long-distance telephone network, has been demonstrated to be a viable technology. However, the algorithms tested and the tasks evaluated typically assume that user input be restricted to only a set of defined vocabulary words. Recently, a large scale trial of speaker independent isolated word speech recognition technology was carried out in Hayward, CA. The task chosen required that users speak, in isolation, one of five defined vocabulary words (collect, calling card, person, third number, and operator). Recognition results were obtained which showed that when users spoke the vocabulary words in an isolated fashion, the words were correctly recognized about 99% of the time. However, observations of customer responses during this trial showed that about 20% of the utterances had the desired vocabulary word along with extraneous input which ranged from nonspeech sounds (e.g., clicks and breath noises) to groups of nonvocabulary words (e.g., “I want to make a collect call please”). Most conventional recognition algorithms have not been designed to handle this type of input. As such, modification of the algorithms had to be made to recognize vocabulary words embedded in speech (i.e., a form of keyword spotting). This paper describes the modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMM's) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion. The novelty of our approach is that we create statistical models of both the actual vocabulary words and the extraneous speech and background. An HMM-based connected word recognition system is then used to find the best sequence of background, extraneous speech, and vocabulary word models for matching the actual input. Word recognition accuracy of 99.3% on purely isolated speech (i.e., only vocabulary items and background noise were present), and 95.1% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for the five word vocabulary using the proposed recognition algorithm. © 1990 IEEE
引用
收藏
页码:1870 / 1878
页数:9
相关论文
共 25 条
[1]  
BAHL LR, 1981, P IEEE ICASSP ATLANT, P1168
[2]  
BOSSEMEYER RW, 1988, J ACOUST SOC AM S1, V84
[3]   DETECTING AND LOCATING KEY WORDS IN CONTINUOUS SPEECH USING LINEAR PREDICTIVE CODING [J].
CHRISTIANSEN, RW ;
RUSHFORTH, CK .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (05) :361-367
[4]  
HIGGINS AL, 1985, MAR P IEEE INT C AC, P1233
[5]   MINIMUM PREDICTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION [J].
ITAKURA, F .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :67-72
[6]   CONTINUOUS SPEECH RECOGNITION BY STATISTICAL-METHODS [J].
JELINEK, F .
PROCEEDINGS OF THE IEEE, 1976, 64 (04) :532-556
[7]   A FRAME-SYNCHRONOUS NETWORK SEARCH ALGORITHM FOR CONNECTED WORD RECOGNITION [J].
LEE, CH ;
RABINER, LR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1649-1658
[8]  
LEE CH, 1987, J ACOUST SOC AM S1, V82
[9]  
LEE KF, 1988, THESIS CARNEGIEMELLO
[10]  
MYERS CS, 1980, APR P C AC SPEECH SI, P173