ExonHunter:: a comprehensive approach to gene finding

被引:30
作者
Brejová, B [1 ]
Brown, DG [1 ]
Li, M [1 ]
Vinar, T [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1093/bioinformatics/bti1040
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We present ExonHunter, a new and comprehensive gene finding system that outperforms existing systems and features several new ideas and approaches. Our system combines numerous sources of information (genomic sequences, expressed sequence tags and protein databases of related species) into a gene finder based on a hidden Markov model in a novel and systematic way. In our framework, various sources of information are expressed as partial probabilistic statements about positions in the sequence and their annotation. We then combine these into the final prediction via a quadratic programming method, which we show to be an extension of existing methods. Allowing only partial statements is key to our transparent handling of missing information and coping with the heterogeneous character of individual sources of information. In addition, we give a new method for modeling the length distribution of intergenic regions in hidden Markov models. Results: On a commonly used test set, ExonHunter performs significantly better than the existing gene finders ROSETTA, SLAM and TWINSCAN, with more than two-thirds of genes predicted completely correctly.
引用
收藏
页码:I57 / I65
页数:9
相关论文
共 31 条
[1]   SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model [J].
Alexandersson, M ;
Cawley, S ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (03) :496-502
[2]  
Allen JE, 2004, GENOME RES, V14, P142, DOI 10.1101/gr.1562804
[3]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[4]  
Brejová B, 2003, LECT N BIOINFORMAT, V2812, P78
[5]  
Brejová B, 2002, LECT NOTES COMPUT SC, V2373, P190
[6]  
Brejova Brona, 2004, J Bioinform Comput Biol, V1, P595, DOI 10.1142/S0219720004000326
[7]   Recent advances in gene structure prediction [J].
Brent, MR ;
Guigó, R .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :264-272
[8]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[9]  
CHATTERJI S, 2004, INT C COMP MOL BIOL, P187
[10]  
Fletcher R., 1981, PRACTICAL METHODS OP