Input-output HMM's for sequence processing

被引:160
作者
Bengio, Y
Frasconi, P
机构
[1] AT&T BELL LABS, HOLMDEL, NJ 07733 USA
[2] UNIV FLORENCE, DIPARTIMENTO SISTEMI & INFORMAT, FLORENCE, ITALY
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1996年 / 7卷 / 05期
关键词
D O I
10.1109/72.536317
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider problems of sequence processing and propose a solution based on a discrete-state model in order to represent past context, We introduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state, The model has a statistical interpretation we call input-output hidden Markov model (IOHMM). It can be trained by the estimation-maximization (EM) or generalized EM (GEM) algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation, The model presents similarities to hidden Markov models (HMM's), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMM's are trained using a more discriminant learning paradigm than HMM's, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMM's are well suited for solving grammatical inference problems on a benchmark problem, Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.
引用
收藏
页码:1231 / 1249
页数:19
相关论文
共 61 条
[1]  
Abu-Mostafa Y. S., 1990, Journal of Complexity, V6, P192, DOI 10.1016/0885-064X(90)90006-Y
[2]  
ANGLUIN D, 1983, ACM COMPUT SURV, V15, P237
[3]  
[Anonymous], NEUROCOMPUTING ALGOR
[4]  
[Anonymous], 1990, EUR ASS SIGN PROC WO
[5]  
BAKIS R, 1976, P 19 M AC SOC AM APR
[6]   SMOOTH ONLINE LEARNING ALGORITHMS FOR HIDDEN MARKOV-MODELS [J].
BALDI, P ;
CHAUVIN, Y .
NEURAL COMPUTATION, 1994, 6 (02) :307-318
[7]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[8]  
BECKER S, 1989, 1988 P CONN MOD SUMM, P29
[9]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[10]   Diffusion of context and credit information in Markovian models [J].
Bengio, Y ;
Frasconi, P .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1995, 3 :249-270