Genome annotation past, present, and future: How to define an ORF at each locus

被引:88
作者
Brent, MR [1 ]
机构
[1] Washington Univ, Lab Computat Gen, St Louis, MO 63130 USA
[2] Washington Univ, Dept Comp Sci, St Louis, MO 63130 USA
关键词
D O I
10.1101/gr.3866105
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Driven by competition, automation, and technology, the genomics community has far exceeded its ambition to sequence the human genome by 2005. By analyzing mammalian genomes, we have shed light on the history of our DNA sequence, determined that alternatively spliced RNAs and retroposed pseudogenes are incredibly abundant, and glimpsed the apparently huge number of non-coding RNAs that play significant roles in gene regulation. Ultimately, genome science is likely to provide comprehensive catalogs of these elements. However, the methods we have been using for most of the last 10 years will not yield even one complete open reading frame (CRF) for every gene-the first plateau on the long climb toward a comprehensive catalog. These strategies-sequencing randomly selected cDNA clones, aligning protein sequences identified in other organisms, sequencing more genomes, and manual curation-will have to be supplemented by large-scale amplification and sequencing of specific predicted mRNAs. The steady improvements in gene prediction that have occurred over the last 10 years have increased the efficacy of this approach and decreased its cost. In this Perspective, I review the state of gene prediction roughly 10 years ago, summarize the progress that has been made since, argue that the primary ORF identification methods we have relied on so far are inadequate, and recommend a path toward completing the Catalog of Protein Coding Genes, Version 1.0.
引用
收藏
页码:1777 / 1786
页数:10
相关论文
共 70 条
  • [1] SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model
    Alexandersson, M
    Cawley, S
    Pachter, L
    [J]. GENOME RESEARCH, 2003, 13 (03) : 496 - 502
  • [2] Allen JE, 2004, GENOME RES, V14, P142, DOI 10.1101/gr.1562804
  • [3] AnsariLari MA, 1996, BIOTECHNIQUES, V21, P34
  • [4] Large-scale sequencing in human chromosome 12p13: Experimental and computational gene structure determination
    AnsariLari, MA
    Shen, Y
    Muzny, DM
    Lee, W
    Gibbs, RA
    [J]. GENOME RESEARCH, 1997, 7 (03) : 268 - 280
  • [5] Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes
    Aparicio, S
    Chapman, J
    Stupka, E
    Putnam, N
    Chia, J
    Dehal, P
    Christoffels, A
    Rash, S
    Hoon, S
    Smit, A
    Gelpke, MDS
    Roach, J
    Oh, T
    Ho, IY
    Wong, M
    Detter, C
    Verhoef, F
    Predki, P
    Tay, A
    Lucas, S
    Richardson, P
    Smith, SF
    Clark, MS
    Edwards, YJK
    Doggett, N
    Zharkikh, A
    Tavtigian, SV
    Pruss, D
    Barnstead, M
    Evans, C
    Baden, H
    Powell, J
    Glusman, G
    Rowen, L
    Hood, L
    Tan, YH
    Elgar, G
    Hawkins, T
    Venkatesh, B
    Rokhsar, D
    Brenner, S
    [J]. SCIENCE, 2002, 297 (5585) : 1301 - 1310
  • [6] Bafna V, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P3
  • [7] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [8] Using GeneWise in the Drosophila annotation experiment
    Birney, E
    Durbin, R
    [J]. GENOME RESEARCH, 2000, 10 (04) : 547 - 548
  • [9] GeneWise and genomewise
    Birney, E
    Clamp, M
    Durbin, R
    [J]. GENOME RESEARCH, 2004, 14 (05) : 988 - 995
  • [10] An overview of ensembl
    Birney, E
    Andrews, TD
    Bevan, P
    Caccamo, M
    Chen, Y
    Clarke, L
    Coates, G
    Cuff, J
    Curwen, V
    Cutts, T
    Down, T
    Eyras, E
    Fernandez-Suarez, XM
    Gane, P
    Gibbins, B
    Gilbert, J
    Hammond, M
    Hotz, HR
    Iyer, V
    Jekosch, K
    Kahari, A
    Kasprzyk, A
    Keefe, D
    Keenan, S
    Lehvaslaiho, H
    McVicker, G
    Melsopp, C
    Meidl, P
    Mongin, E
    Pettett, R
    Potter, S
    Proctor, G
    Rae, M
    Searle, S
    Slater, G
    Smedley, D
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Storey, R
    Ureta-Vidal, A
    Woodwark, KC
    Cameron, G
    Durbin, R
    Cox, A
    Hubbard, T
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 925 - 928