Closing in on the C elegans ORFeome by cloning TWINSCAN predictions

被引:29
作者
Wei, CC [1 ]
Lamesch, P
Arumugam, M
Rosenberg, J
Hu, P
Vidal, M
Brent, MR
机构
[1] Washington Univ, Lab Computat Genom, St Louis, MO 63130 USA
[2] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA
[3] Harvard Univ, Sch Med, Dana Farber Canc Inst, Ctr Canc Syst Biol, Boston, MA 02115 USA
[4] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
关键词
D O I
10.1101/gr.3329005
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C elegans and C briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins-the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base "predicted" genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the "parts list" for even the best-studied model organisms.
引用
收藏
页码:577 / 582
页数:6
相关论文
共 24 条
[1]  
[Anonymous], 1998, SCIENCE, V282, P2012
[2]   Recent advances in gene structure prediction [J].
Brent, MR ;
Guigó, R .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :264-272
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]   A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution [J].
Cho, SC ;
Jin, SW ;
Cohen, A ;
Ellis, RE .
GENOME RESEARCH, 2004, 14 (07) :1207-1220
[5]   Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map [J].
Flicek, P ;
Keibler, E ;
Hu, P ;
Korf, I ;
Brent, MR .
GENOME RESEARCH, 2003, 13 (01) :46-54
[6]  
GROSS SS, 2005, IN PRESS RECOMB
[7]   Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes [J].
Guigó, R ;
Dermitzakis, ET ;
Agarwal, P ;
Ponting, CP ;
Parra, G ;
Reymond, A ;
Abril, JF ;
Keibler, E ;
Lyle, R ;
Ucla, C ;
Antonarakis, SE ;
Brent, MR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (03) :1140-1145
[8]   WormBase: a multi-species resource for nematode biology and genomics [J].
Harris, TW ;
Chen, NS ;
Cunningham, F ;
Tello-Ruiz, M ;
Antoshechkin, I ;
Bastiani, C ;
Bieri, T ;
Blasiar, D ;
Bradnam, K ;
Chan, J ;
Chen, CK ;
Chen, WJ ;
Davis, P ;
Kenny, E ;
Kishore, R ;
Lawson, D ;
Lee, R ;
Muller, HM ;
Nakamura, C ;
Ozersky, P ;
Petcherski, A ;
Rogers, A ;
Sabo, A ;
Schwarz, EM ;
Van Auken, K ;
Wang, QH ;
Durbin, R ;
Spieth, J ;
Sternberg, PW ;
Stein, LD .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D411-D417
[9]   DNA cloning using in vitro site-specific recombination [J].
Hartley, JL ;
Temple, GF ;
Brasch, MA .
GENOME RESEARCH, 2000, 10 (11) :1788-1795
[10]   GAZE: A generic framework for the integration of gene-prediction data by dynamic programming [J].
Howe, KL ;
Chothia, T ;
Durbin, R .
GENOME RESEARCH, 2002, 12 (09) :1418-1427