Genie -: Gene finding in Drosophila melanogaster

被引:106
作者
Reese, MG
Kulp, D
Tammana, H
Haussler, D
机构
[1] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley Drosophila Genome Project, Berkeley, CA 94720 USA
[2] Neomorph Inc, Berkeley, CA 94710 USA
[3] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
关键词
D O I
10.1101/gr.10.4.529
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A hidden Markov model-based gene-finding system called Genie Was applied to the genomic Adh region in Drosophila melanogaster as a part of the Genome Annotation Assessment Project (GASP). Predictions from three versions of the Genie gene-Finding system were submitted, one based on statistical properties of coding genes, a second included EST alignment information, and a third that integrated protein sequence homology information. Ail three programs were trained on the provided Drosophila training data. In addition, promoter assignments from an integrated neural network were submitted. The gene assignments overlapped >90% of the 222 annotated genes and 26 possibly novel genes were predicted, of which some might be overpredictions. The system correctly identified the exon boundaries of 70% of the exons in cDNA-confirmed genes and 77% of the exons with the addition of EST sequence alignments. The best of the three Genie submissions predicted 19 of the annotated 43 gene structures entirely correct (44%). In the promoter category, only 30% of the transcription start sites could be detected, but by integrating this program as a sensor into Genie the false-positive rate could be dropped to 1/16,786 (0.006%). The results of the experiment on the long contiguous genomic sequence revealed some problems concerning gene assembly in Genie. The results were used to improve the system. We show that Genie is a robust hidden Markov model system that allows for a generalized integration of information from different sources such as signal sensors (splice sites, start codon, etc), content sensors (exons, introns, intergenic) and alignments of mRNA, EST, and peptide sequences. The assessment showed that Genie could effectively be used For the annotation of complete genomes From higher organisms.
引用
收藏
页码:529 / 538
页数:10
相关论文
共 11 条
[1]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[2]  
Ashburner M, 1999, GENETICS, V153, P179
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]   ASSESSMENT OF PROTEIN CODING MEASURES [J].
FICKETT, JW ;
TUNG, CS .
NUCLEIC ACIDS RESEARCH, 1992, 20 (24) :6441-6450
[5]  
Haussler D, 1998, TRENDS BIOTECHNOL, P12, DOI 10.1016/S0167-7799(98)00129-2
[6]  
KULP D, 1996, 26TH P ISMB, V4, P134
[7]  
KULP D, 1997, PAC S BIOC, V2, P232
[8]   Improved splice site detection in Genie [J].
Reese, MG ;
Eeckman, FH ;
Kulp, D ;
Haussler, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) :311-323
[9]  
REESE MG, 2000, GENOME RES
[10]  
REESE MG, 2000, THESIS U HOHENHEIM H