Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes

被引:36
作者
Lin, Michael F. [1 ,2 ]
Deoras, Ameya N. [3 ]
Rasmussen, Matthew D. [3 ]
Kellis, Manolis [1 ,2 ,3 ]
机构
[1] MIT, Broad Inst, Cambridge, MA 02139 USA
[2] Harvard Univ, Cambridge, MA 02138 USA
[3] Harvard Univ, MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
D O I
10.1371/journal.pcbi.1000067
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (<= 240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.
引用
收藏
页数:14
相关论文
共 66 条
  • [1] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [2] Gene expression and molecular evolution
    Akashi, H
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2001, 11 (06) : 660 - 666
  • [3] SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model
    Alexandersson, M
    Cawley, S
    Pachter, L
    [J]. GENOME RESEARCH, 2003, 13 (03) : 496 - 502
  • [4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [5] Genomic signal processing
    Anastassiou, D
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (04) : 8 - 20
  • [6] [Anonymous], 2002, Genome Biol, DOI [DOI 10.1186/GB-2002-3-12-RESEARCH0086, 10.1186/gb-2002-3-12-research0086]
  • [7] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [8] CRITICA: Coding region identification tool invoking comparative analysis
    Badger, JH
    Olsen, GJ
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) : 512 - 524
  • [9] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [10] Global discriminative learning for higher-accuracy computational gene prediction
    Bernal, Axel
    Crammer, Koby
    Hatzigeorgiou, Artemis
    Pereira, Fernando
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (03) : 488 - 497