A Bayesian framework for combining gene predictions

被引:35
作者
Pavlovic, V [1 ]
Garg, A
Kasif, S
机构
[1] Boston Univ, Dept Bioengn, Bioinformat Program, Boston, MA 02215 USA
[2] Univ Illinois, Beckman Inst, Urbana, IL 61801 USA
[3] Compaq Comp Corp, Cambridge Res Lab, Cambridge, MA USA
关键词
D O I
10.1093/bioinformatics/18.1.19
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
motivation: Gene identification and gene discovery in new genomic sequences is one of the most timely computational questions addressed by bioinformatics scientists. This computational research has resulted in several systems that have been used successfully in many whole-genome analysis projects. As the number of such systems grows the need for a rigorous way to combine the predictions becomes more essential. Results: In this paper we provide a Bayesian network framework for combining gene predictions from multiple systems. The framework allows us to treat the problem as combining the advice of multiple experts. Previous work in the area used relatively simple ideas such as majority voting. We introduce, for the first time, the use of hidden input/output Markov models for combining gene predictions. We apply the framework to the analysis of the Adh region in Drosophila that has been carefully studied in the context of gene finding and used as a basis for the GASP competition. The main challenge in combination of gene prediction programs is the fact that the systems are relying on similar features such as cod on usage and as a result the predictions are often correlated. We show that our approach is promising to improve the prediction accuracy and provides a systematic and flexible framework for incorporating multiple sources of evidence into gene prediction systems.
引用
收藏
页码:19 / 27
页数:9
相关论文
共 24 条
[1]  
[Anonymous], NEW COMPREHENSIVE BI
[2]  
[Anonymous], P 13 INT JOINT C ART
[3]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[4]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[5]  
BURGE CB, 1998, COMPUTATIONAL METHOD, P129
[6]  
CAI D, 2000, BIOINFORMATICS, V2, P152
[7]   Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi [J].
Fraser, CM ;
Casjens, S ;
Huang, WM ;
Sutton, GG ;
Clayton, R ;
Lathigra, R ;
White, O ;
Ketchum, KA ;
Dodson, R ;
Hickey, EK ;
Gwinn, M ;
Dougherty, B ;
Tomb, JF ;
Fleischmann, RD ;
Richardson, D ;
Peterson, J ;
Kerlavage, AR ;
Quackenbush, J ;
Salzberg, S ;
Hanson, M ;
vanVugt, R ;
Palmer, N ;
Adams, MD ;
Gocayne, J ;
Weidman, J ;
Utterback, T ;
Watthey, L ;
McDonald, L ;
Artiach, P ;
Bowman, C ;
Garland, S ;
Fujii, C ;
Cotton, MD ;
Horst, K ;
Roberts, K ;
Hatch, B ;
Smith, HO ;
Venter, JC .
NATURE, 1997, 390 (6660) :580-586
[8]  
JENSEN VF, 1995, INTRO BAYESIAN NETWO
[9]   HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM [J].
JORDAN, MI ;
JACOBS, RA .
NEURAL COMPUTATION, 1994, 6 (02) :181-214
[10]  
Jordan MI., 1998, LEARNING GRAPHICAL M