Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes

被引:111
作者
Lin, Michael F.
Carlson, Joseph W.
Crosby, Madeline A.
Matthews, Beverley B.
Yu, Charles
Park, Soo
Wan, Kenneth H.
Schroeder, Andrew J.
Gramates, L. Sian
Pierre, Susan E. St.
Roark, Margaret
Wiley, Kenneth L., Jr.
Kulathinal, Rob J.
Zhang, Peili
Myrick, Kyl V.
Antone, Jerry V.
Celniker, Susan E.
Gelbart, William M.
Kellis, Manolis [1 ]
机构
[1] Harvard Univ, Dept Mol & Cell Biol, Cambridge, MA 02138 USA
[2] Harvard Univ, MIT, Broad Inst, Cambridge, MA 02139 USA
[3] Lawrence Berkeley Natl Lab, Dept Genom Biol, Div Life Sci, Berkeley Drosophila Genome Project, Berkeley, CA 94720 USA
[4] Harvard Univ, Flybase Biol Labs, Cambridge, MA 02138 USA
[5] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
D O I
10.1101/gr.6679507
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect > 10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster.
引用
收藏
页码:1823 / 1836
页数:14
相关论文
共 50 条
  • [1] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [2] Andrews J, 1996, GENETICS, V143, P1699
  • [3] RNA editing by adenosine deaminases that act on RNA
    Bass, BL
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 2002, 71 : 817 - 846
  • [4] BERGMAN CM, 2002, GENOME BIOL, V3, DOI DOI 10.1186/GB-2002-3-12-RESEARCH0086
  • [5] BERGSTROM DE, 1995, GENETICS, V139, P1331
  • [6] Global discriminative learning for higher-accuracy computational gene prediction
    Bernal, Axel
    Crammer, Koby
    Hatzigeorgiou, Artemis
    Pereira, Fernando
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (03) : 488 - 497
  • [7] Aligning multiple genomic sequences with the threaded blockset aligner
    Blanchette, M
    Kent, WJ
    Riemer, C
    Elnitski, L
    Smit, AFA
    Roskin, KM
    Baertsch, R
    Rosenbloom, K
    Clawson, H
    Green, ED
    Haussler, D
    Miller, W
    [J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
  • [8] MAVID: Constrained ancestral alignment of multiple sequences
    Bray, N
    Pachter, L
    [J]. GENOME RESEARCH, 2004, 14 (04) : 693 - 699
  • [9] Genome annotation past, present, and future: How to define an ORF at each locus
    Brent, MR
    [J]. GENOME RESEARCH, 2005, 15 (12) : 1777 - 1786
  • [10] The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: Multigenic transcription in higher organisms
    Brogna, S
    Ashburner, M
    [J]. EMBO JOURNAL, 1997, 16 (08) : 2023 - 2031