Evaluation of gene-finding programs on mammalian sequences

被引:167
作者
Rogic, S [1 ]
Mackworth, AK
Ouellette, FBF
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
[3] Ctr Mol Med & Therapeut, Vancouver, BC V5Z 4H4, Canada
关键词
D O I
10.1101/gr.147901
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present an independent comparative analysis of seven recently developed gene-finding programs: FGENES, GeneMark.hmm, Genie, Genscan, HMMgene, Morgan, and MZEF. For evaluation purposes we developed a new, thoroughly filtered, and biologically validated dataset of mammalian genomic sequences that does not overlap with the training sets of the programs analyzed. Our analysis shows that the new generation of programs has substantially better results than the programs analyzed in previous studies. The accuracy of the programs was also examined as a function of various sequence and prediction features, such as G + C content of the sequence, length and type of exons, signal type, and score of the exon prediction. This approach pinpoints the strengths and weaknesses of each individual program as well as those of computational gene-finding in general. The dataset used in this analysis (HMR195) as well as the tables with the complete results are available at http://www.cs.ubc.ca/similar to rogic/evaluation/.
引用
收藏
页码:817 / 832
页数:16
相关论文
共 45 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]  
[Anonymous], 1997, THESIS STANFORD U ST
[4]  
Ashburner M, 1999, GENETICS, V153, P179
[5]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :15-18
[6]   THE ISOCHORE ORGANIZATION OF THE HUMAN GENOME AND ITS EVOLUTIONARY HISTORY - A REVIEW [J].
BERNARDI, G .
GENE, 1993, 135 (1-2) :57-66
[7]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[8]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[9]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[10]   Genome sequence of the nematode C-elegans:: A platform for investigating biology [J].
不详 .
SCIENCE, 1998, 282 (5396) :2012-2018