ANALYSIS OF THE CORRELATION STRUCTURE FOR A NEURAL PREDICTIVE MODEL WITH APPLICATION TO SPEECH RECOGNITION

被引:18
作者
DENG, L
HASSANEIN, K
ELMASRY, M
机构
[1] Univ of Waterloo, Waterloo, Canada
关键词
TEMPORAL CORRELATIONS; JOINT LINEAR NONLINEAR PREDICTION; MULTILAYER PERCEPTRON; HMM;
D O I
10.1016/0893-6080(94)90027-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A speech recognizer is developed using a layered feedforward neural network to implement speech-frame prediction. A Markov chain is used to control changes in the network's weight parameters. We postulate that speech recognition accuracy is closely linked to the capability of the predictive model in representing long-term temporal correlations in speech data. Analytical expressions are obtained for the correlation functions for various types of predictive models (linear, compressively nonlinear, and jointly linear and compressively nonlinear) to determine the faithfulness of the models to the actual speech data. Analytical results, computer simulations, and speech recognition experiments suggest that when compressive nonlinear prediction and linear prediction are jointly performed within the same layer of the neural network, the model is better at capturing long-term data correlations and consequently improving speech recognition performance.
引用
收藏
页码:331 / 339
页数:9
相关论文
共 20 条
[1]  
Baum L., 1972, INEQUALITIES, V3, P1
[2]  
BOX GEP, 1976, TIME SERIES ANAL FOR, P67
[3]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[4]  
DENG D, 1990, COMPUTER SPEECH LANG, V4, P345
[5]   MODELING MICROSEGMENTS OF STOP CONSONANTS IN A HIDDEN MARKOV MODEL BASED WORD RECOGNIZER [J].
DENG, L ;
LENNIG, M ;
MERMELSTEIN, P .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (06) :2738-2747
[6]   A GENERALIZED HIDDEN MARKOV MODEL WITH STATE-CONDITIONED TREND FUNCTIONS OF TIME FOR THE SPEECH SIGNAL [J].
DENG, L .
SIGNAL PROCESSING, 1992, 27 (01) :65-78
[7]   STRUCTURAL DESIGN OF HIDDEN MARKOV MODEL SPEECH RECOGNIZER USING MULTIVALUED PHONETIC FEATURES - COMPARISON WITH SEGMENTAL SPEECH UNITS [J].
DENG, L ;
ERLER, K .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1992, 92 (06) :3058-3067
[8]   PHONEMIC HIDDEN MARKOV-MODELS WITH CONTINUOUS MIXTURE OUTPUT DENSITIES FOR LARGE VOCABULARY WORD RECOGNITION [J].
DENG, L ;
KENNY, P ;
LENNIG, M ;
GUPTA, V ;
SEITZ, F ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (07) :1677-1681
[9]  
Fant G, 1960, ACOUSTIC THEORY SPEE
[10]   MULTILAYER FEEDFORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS [J].
HORNIK, K ;
STINCHCOMBE, M ;
WHITE, H .
NEURAL NETWORKS, 1989, 2 (05) :359-366