LINKS BETWEEN MARKOV-MODELS AND MULTILAYER PERCEPTRONS

被引：104

作者：

BOURLARD, H

WELLEKENS, CJ

机构：

[1] Philips Research Laboratory, B-1348 Louvain-la-Neuve

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 1990年 / 12卷 / 12期

关键词：

Markov models; maximum a posteriori probability; maximum likelihood; multilayer perceptron; speech recognition;

D O I：

10.1109/34.62605

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The statistical and sequential nature of the human speech production system makes automatic speech recognition difficult. Hidden Markov models (HMM) have provided a good representation of these characteristics of speech, and were a breakthrough in speech recognition research. But HMM's suffer from a weak discriminative power. Recently, connectionist models have been recognized as an alternative tool. Their main properties are their discriminative power and their ability to capture input-output relationships. They have also proved useful in dealing with statistical data (e.g., phonetic classification of speech features). However, connectionist models to date are not well-suited for dealing with time-varying input patterns. In this paper, the statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant HMM is defined and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMM's, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input (i.e., the maximum a posteriori probabilities (MAP)]. Results of a series of speech recognition experiments are reported. It is shown that, by using contextual information at the input of the MLP, frame classification performance can be achieved which is significantly improved over the corresponding performance for simple maximum likelihood estimates (MLE), or even MAP without the benefit of context. On this basis, the possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained. © 1990 IEEE

引用

页码：1167 / 1178

页数：12

共 46 条

[1] A MAXIMUM-LIKELIHOOD APPROACH TO CONTINUOUS SPEECH RECOGNITION [J].

BAHL, LR ;

JELINEK, F ;

MERCER, RL .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (02) :179-190

[2]

BAHL LR, 1986, P IEEE INT C AC SPEE, P49

[3]

Bourlard H., 1989, Computer Speech and Language, V3, P1, DOI 10.1016/0885-2308(89)90011-9

[4]

BOURLARD H, 1987, 1ST P IEEE INT C NEU, P407

[5]

BOURLARD H, 1986, P EUSIPCO 86, P507

[6]

BOURLARD H, 1985, SPEECH SPEAKER RECOG

[7]

Bourlard H., 1990, ADV NEURAL INFORMATI, V2, P186

[8]

Bridle J. S., 1982, Proceedings of ICASSP 82. IEEE International Conference on Acoustics, Speech and Signal Processing, P899

[9]

BRIDLE JS, 1989, P C NEURAL NETWORK C

[10]

BROWN PF, 1987, THESIS CARNEGIEMELLO

← 1 2 3 4 5 →