Gene prediction and verification in a compact genome with numerous small introns

被引:28
作者
Tenney, AE
Brown, RH
Vaske, C
Lodge, JK
Doering, TL
Brent, MR [1 ]
机构
[1] Washington Univ, Lab Computat Genom, St Louis, MO 63130 USA
[2] Washington Univ, Dept Comp Sci, St Louis, MO 63130 USA
[3] St Louis Univ, Sch Med, Dept Biochem & Mol Biol, St Louis, MO 63104 USA
[4] Washington Univ, Sch Med, Dept Mol Microbiol, St Louis, MO 63110 USA
关键词
D O I
10.1101/gr.2816704
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The genomes of clusters of related eukaryotes are now being sequenced at an increasing rate, creating a need for accurate, low-cost annotation of exon-intron structures. In this paper, we demonstrate that reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on predicted gene structures satisfy this need, at least for single-celled eukaryotes. The TWINSCAN gene prediction algorithm was adapted for the fungal pathogen Cryptococcus neoformans by using a precise model of intron lengths in combination with ungapped alignments between the genome sequences of the two closely related Cryptococcus varieties. This approach resulted in similar to60% of known genes being predicted exactly right at every coding base and splice site. When previously unannotated TWINSCAN predictions were tested by RT-PCR and direct sequencing, 75% of targets spanning two predicted introns were amplified and produced high-quality sequence. When targets spanning the complete predicted open reading frame were tested, 72% of them amplified and produced high-quality sequence. We conclude that sequencing a small number of expressed sequence tags (ESTs) to provide training data, running TWINSCAN on an entire genome, and then performing RT-PCR and direct sequencing on all of its predictions would be a cost-effective method for obtaining an experimentally verified genome annotation.
引用
收藏
页码:2330 / 2335
页数:6
相关论文
共 13 条
[1]  
Allen JE, 2004, GENOME RES, V14, P142, DOI 10.1101/gr.1562804
[2]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[3]   Phat -: a gene finding program for Plasmodium falciparum [J].
Cawley, SE ;
Wirth, AI ;
Speed, TP .
MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 2001, 118 (02) :167-174
[4]   Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map [J].
Flicek, P ;
Keibler, E ;
Hu, P ;
Korf, I ;
Brent, MR .
GENOME RESEARCH, 2003, 13 (01) :46-54
[5]  
Gerhard DS, 2004, GENOME RES, V14, P2121, DOI 10.1101/gr.2596504
[6]  
Jacobson ES, 1998, INFECT IMMUN, V66, P4169
[7]  
Korf I, 2001, Bioinformatics, V17 Suppl 1, pS140
[8]  
Mott R, 1997, COMPUT APPL BIOSCI, V13, P477
[9]   C-elegans ORFeome version 1.1:: experimental verification of the genome annotation and resource for proteome-scale protein expression [J].
Reboul, J ;
Vaglio, P ;
Rual, JF ;
Lamesch, P ;
Martinez, M ;
Armstrong, CM ;
Li, SM ;
Jacotot, L ;
Bertin, N ;
Janky, R ;
Moore, T ;
Hudson, JR ;
Hartley, JL ;
Brasch, MA ;
Vandenhaute, J ;
Boulton, S ;
Endress, GA ;
Jenna, S ;
Chevet, E ;
Papasotiropoulos, V ;
Tolias, PP ;
Ptacek, J ;
Snyder, M ;
Huang, R ;
Chance, MR ;
Lee, HM ;
Doucette-Stamm, L ;
Hill, DE ;
Vidal, M .
NATURE GENETICS, 2003, 34 (01) :35-41
[10]  
Rozen S, 2000, Methods Mol Biol, V132, P365