Modeling splice sites with Bayes networks

被引:64
作者
Cai, DY [1 ]
Delcher, A
Kao, B
Kasif, S
机构
[1] Univ Illinois, Dept Elect Engn & Comp Sci, Chicago, IL 60607 USA
[2] Loyola Coll, Dept Comp Sci, Baltimore, MD 21210 USA
[3] Celera Genom, Rockville, MD 20850 USA
关键词
D O I
10.1093/bioinformatics/16.2.152
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of transcription in human DNA). These methods can subsequently be utilized to improve the performance of gene;finding systems. The models built here attempt to model long-distance dependencies between non-adjacent bases. Results: An efficient modeling method is described which models biological data more accurately than a first-order Markov model without increasing the number of parameters. Intuitively, a small number of parameters helps a learning system to avoid overfitting. Several experiments with the model are presented which show a small improvement in the average accuracy as compared with a simple Markov model. These experiments suggest that single long distance dependencies do not hell, the recognition problem, thus confirming several previous studies which have used more heuristic modeling techniques. Availability: This software is available for download and as a web resource at http://www.ai.uic.edu/software Contact: kasif@eecs.nic.edu.
引用
收藏
页码:152 / 158
页数:7
相关论文
共 11 条
[1]  
Agarwal P., 1998, RECOMB 98. Proceedings of the Second Annual International Conference on Computational Molecular Biology, P2, DOI 10.1145/279069.279076
[2]   WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES [J].
BUCHER, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) :563-578
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]   APPROXIMATING DISCRETE PROBABILITY DISTRIBUTIONS WITH DEPENDENCE TREES [J].
CHOW, CK ;
LIU, CN .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (03) :462-+
[5]   Finding genes in DNA with a Hidden Markov Model [J].
Henderson, J ;
Salzberg, S ;
Fasman, KH .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (02) :127-141
[6]  
HERTZ GZ, 1990, COMPUT APPL BIOSCI, V6, P81
[7]   Improved splice site detection in Genie [J].
Reese, MG ;
Eeckman, FH ;
Kulp, D ;
Haussler, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) :311-323
[8]  
Salzberg SL, 1997, COMPUT APPL BIOSCI, V13, P365
[9]   Microbial gene identification using interpolated Markov models [J].
Salzberg, SL ;
Delcher, AL ;
Kasif, S ;
White, O .
NUCLEIC ACIDS RESEARCH, 1998, 26 (02) :544-548
[10]   Automated gene identification in large-scale genomic sequences [J].
Xu, Y ;
Uberbacher, EC .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) :325-338