NEURAL NETWORKS FOR STATISTICAL RECOGNITION OF CONTINUOUS SPEECH

被引:63
作者
MORGAN, N [1 ]
BOURLARD, HA [1 ]
机构
[1] FAC POLYTECH MONS,B-7000 MONS,BELGIUM
关键词
D O I
10.1109/5.381844
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent slate-of-the art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here.
引用
收藏
页码:742 / 770
页数:29
相关论文
共 90 条
[1]  
Allen J., How do humans process and recognize speech, IEEE Trans. Speech and Audio Process., 2, pp. 567-578, (1994)
[2]  
AsanovicA K., Beck J., Feldman J., Morgan N., Wawrzynek J., A supercomputer for neural computation, Proc. ICNN, 7, pp. 4462-4465, (1994)
[3]  
Bahl L.R., Brown P.F., de Souza P.V., Mercer R.L., Maximum mutual information estimation of hidden Markov model parameters, IEEE Proc. Int. Conf. on Acoust., Speech, and Signal Process., pp. 49-52, (1986)
[4]  
Baum L., An inequality and associated maximization techniques in statistical estimation of probabilistic functions of Markov processes, Inequalities, pp. 1-8, (1972)
[5]  
Bengio Y., De Mori R., Flammia G., Kompe R., Global optimization of a neural neural network-hidden Markov model hybrid, IEEE Trans. Neural Networks, 3, pp. 252-258, (1992)
[6]  
Bourlard H., Wellekens C.J., Links between Markov models and multilayer perceptrons, IEEE Trans. Patt. Anal. and Mach. Intell., 12, pp. 1167-1178, (1990)
[7]  
Bourlard H., Morgan N., Connectionist Speech Recognition-A Hybrid Approach., (1994)
[8]  
Bourlard H., Konig Y., Morgan N., “REMAP: recursive estimation and maximization of a posterior probabilities — Application to transition-based connectionist speech recognition,”, (1994)
[9]  
Bridle J.S., Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition, Neurocomputing: Algorithms, Architectures and Applications, pp. 227-236, (1990)
[10]  
Bridle J.S., Alpha-Nets: a recurrent neural network architecture with a hidden Markov model interpretation, Speech Commun., 9, pp. 83-92, (1990)