An analysis of gene-finding programs for Neurospora crassa

被引:10
作者
Kraemer, E [1 ]
Wang, J
Guo, JH
Hopkins, S
Arnold, J
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
[2] Univ Georgia, Dept Genet, Athens, GA 30602 USA
关键词
D O I
10.1093/bioinformatics/17.10.901
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational gene identification plays an important role in genome projects. The approaches used in gene identification programs are often tuned to one particular organism, and accuracy for one organism or class of organism does not necessarily translate to accurate predictions for other organisms. In this paper we evaluate five computer programs on their ability to locate coding regions and to predict gene structure in Neurospora crassa. One of these programs (FFG) was designed specifically for gene-finding in N.crassa, but the model parameters have not yet been fully 'tuned', and the program should thus be viewed as an initial prototype. The other four programs were neither designed nor tuned for N.crassa. Results: We describe the data sets on which the experiments were performed, the approaches employed by the five algorithms: GenScan, HMMGene, GeneMark, Pombe and FFG, the methodology of our evaluation, and the results of the experiments. Our results show that, while none of the programs consistently performs well, overall the GenScan program has the best performance on sensitivity and Missing Exons (ME) while the HMMGene and FFG programs have good performance in locating the exons roughly. Additional work motivated by this study includes the creation of a tool for the automated evaluation of gene-finding programs, the collection of larger and more reliable data sets for N.crassa, parameterization of the model used in FFG to produce a more accurate gene-finding program for this species, and a more in-depth evaluation of the reasons that existing programs generally fail for N.crassa.
引用
收藏
页码:901 / 912
页数:12
相关论文
共 43 条
[1]  
Bean LE, 2001, GENETICS, V157, P1067
[2]  
BENIAN G, 1996, J CELL BIOL, V6, P835
[3]  
BLATTNER FR, 1993, NUCLEIC ACIDS RES, V21, P5408
[4]  
BORODOVSKII MY, 1986, MOL BIOL+, V20, P833
[5]  
BORODOVSKII MY, 1986, MOL BIOL+, V20, P826
[6]   DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES [J].
BORODOVSKY, M ;
MCININCH, JD ;
KOONIN, EV ;
RUDD, KE ;
MEDIGUE, C ;
DANCHIN, A .
NUCLEIC ACIDS RESEARCH, 1995, 23 (17) :3554-3562
[7]   INTRINSIC AND EXTRINSIC APPROACHES FOR DETECTING GENES IN A BACTERIAL GENOME [J].
BORODOVSKY, M ;
RUDD, KE ;
KOONIN, EV .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4756-4767
[8]   NEW GENES IN OLD SEQUENCE - A STRATEGY FOR FINDING GENES IN THE BACTERIAL GENOME [J].
BORODOVSKY, M ;
KOONIN, EV ;
RUDD, KE .
TRENDS IN BIOCHEMICAL SCIENCES, 1994, 19 (08) :309-313
[9]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[10]  
BORODOVSKY M, 1986, MOL BIOL, V20, P1145