Finding short DNA motifs using permuted Markov models

被引:49
作者
Zhao, XY
Huang, HY
Speed, TP
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Walter & Eliza Hall Inst Med Res, Div Genet & Bioinformat, Melbourne, Vic 3050, Australia
关键词
permuted variable length Markov models; maximal dependence decomposition models; weight matrix models; model selection; DNA motifs;
D O I
10.1089/cmb.2005.12.894
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many short DNA motifs, such as transcription factor binding sites (TFBS) and splice sites, exhibit strong local as well as nonlocal dependence. We introduce permuted variable length Markov models (PVLMM) which could capture the potentially important dependencies among positions and apply them to the problem of detecting splice and TFB sites. They have been satisfactory from the viewpoint of prediction performance and also give ready biological interpretations of the sequence dependence observed. The issue of model selection is also studied.
引用
收藏
页码:894 / 906
页数:13
相关论文
共 34 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model [J].
Alexandersson, M ;
Cawley, S ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (03) :496-502
[3]  
Bailey T., 1994, P 2 INT C INT SYST M, P28
[4]  
BARASH Y, 2003, RECOMB 03
[5]   The minimum description length principle in coding and modeling [J].
Barron, A ;
Rissanen, J ;
Yu, B .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (06) :2743-2760
[6]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[7]  
Breiman L., 1998, CLASSIFICATION REGRE
[8]   Model selection for variable length Markov chains and tuning the context algorithm [J].
Bühlmann, P .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2000, 52 (02) :287-315
[9]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[10]   SpliceDB: database of canonical and non-canonical mammalian splice sites [J].
Burset, M ;
Seledtsov, IA ;
Solovyev, VV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :255-259