Algorithms for variable length Markov chain modeling

被引:27
作者
Bejerano, G [1 ]
机构
[1] Univ Calif Santa Cruz, Sch Engn, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
关键词
D O I
10.1093/bioinformatics/btg489
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Summary: We present a general purpose implementation of variable length Markov models. Contrary to fixed order Markov models, these models are not restricted to a predefined uniform depth. Rather, by examining the training data, a model is constructed that fits higher order Markov dependencies where such contexts exist, while using lower order Markov dependencies elsewhere. As both theoretical and experimental results show, these models are capable of capturing rich signals from a modest amount of training data, without the use of hidden states.
引用
收藏
页码:788 / U729
页数:10
相关论文
共 4 条
[1]   Variations on probabilistic suffix trees: statistical modeling and prediction of protein families [J].
Bejerano, G ;
Yona, G .
BIOINFORMATICS, 2001, 17 (01) :23-43
[2]  
BEJERANO G, 2003, THESIS HEBREW U
[3]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[4]  
Ron D, 1996, MACH LEARN, V25, P117, DOI 10.1007/BF00114008