DATA DRIVEN SEARCH ORGANIZATION FOR CONTINUOUS SPEECH RECOGNITION

被引:38
作者
NEY, H [1 ]
MERGEL, D [1 ]
NOLL, A [1 ]
PAESELER, A [1 ]
机构
[1] ASPECT GMBH,W-2000 NORDERSTEDT,GERMANY
关键词
D O I
10.1109/78.124938
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes an architecture and search organization for continuous speech recognition. The recognition module is part of the SPICOS system for the understanding of data base queries spoken in natural language. The recognition is based on statistical decision theory and thus amounts to an integrated approach that combines all available knowledge sources, such as inventory of subword units, pronunciation lexicon, and language model, and attempts to avoid local decisions during the process of acoustic recognition. The recognition decision amounts to a time-synchronous, left-to-right search through a large state space with delayed decisions. The recognized word sequence is then the best interpretation of the observed acoustic data within the constraints as given by the knowledge sources. The organization of the search can be viewed as an extension of the one-pass dynamic programming algorithm for connected word recognition. In continuous speech recognition, however, the search space is much larger, and an efficient organization of the search process is called for in order to keep the organization overhead as small as possible. In this paper, we present such an efficient search organization with the following characteristics. Its computational cost is proportional only to the number of hypotheses actually generated and is independent of the overall size of the potential search space. There is no limit to the number of word hypotheses, there is only a limit to the overall number of hypotheses due to storage constraints. The implementation of the search has been tested on a continuous speech data base comprising up to 4000 words for each of several speakers. In particular, the efficiency and robustness of the search organization has been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models.
引用
收藏
页码:272 / 281
页数:10
相关论文
共 13 条
[1]   A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].
BAHL, LR ;
JELINEK, F ;
MERCER, RL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190
[2]  
Bridle J. S., 1982, Proceedings of ICASSP 82. IEEE International Conference on Acoustics, Speech and Signal Processing, P899
[3]   AN ADAPTIVE, ORDERED, GRAPH SEARCH TECHNIQUE FOR DYNAMIC TIME WARPING FOR ISOLATED WORD RECOGNITION [J].
BROWN, MK ;
RABINER, LR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1982, 30 (04) :535-544
[4]  
CHOW YL, 1987, APR P IEEE INT C AC, P89
[5]  
LEE KF, 1988, THESIS CARNEGIEMELLO
[6]   STRUCTURAL METHODS IN AUTOMATIC SPEECH RECOGNITION [J].
LEVINSON, SE .
PROCEEDINGS OF THE IEEE, 1985, 73 (11) :1625-1650
[7]  
Lowerre B., 1980, TRENDS SPEECH RECOGN
[8]  
MERGEL D, 1987, 1987 P IEEE INT C AC, P844
[9]   THE USE OF A ONE-STAGE DYNAMIC-PROGRAMMING ALGORITHM FOR CONNECTED WORD RECOGNITION [J].
NEY, H .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :263-271
[10]  
NEY H, 1988, 1988 P IEEE INT C AC, P437