EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

被引:892
作者
Vilella, Albert J. [1 ]
Severin, Jessica [1 ]
Ureta-Vidal, Abel [1 ]
Heng, Li [2 ]
Durbin, Richard [2 ]
Birney, Ewan [1 ]
机构
[1] EMBL EBI, Cambridge CB10 1SD, England
[2] Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England
基金
英国惠康基金;
关键词
MAXIMUM-LIKELIHOOD; GENOME SEQUENCE; DATABASE; EVOLUTION; INSIGHTS; ALGORITHM; ORTHOLOGS; FAMILIES; PARALOGS;
D O I
10.1101/gr.073585.107
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.
引用
收藏
页码:327 / 335
页数:9
相关论文
共 27 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]   The draft genome of Ciona intestinalis:: Insights into chordate and vertebrate origins [J].
Dehal, P ;
Satou, Y ;
Campbell, RK ;
Chapman, J ;
Degnan, B ;
De Tomaso, A ;
Davidson, B ;
Di Gregorio, A ;
Gelpke, M ;
Goodstein, DM ;
Harafuji, N ;
Hastings, KEM ;
Ho, I ;
Hotta, K ;
Huang, W ;
Kawashima, T ;
Lemaire, P ;
Martinez, D ;
Meinertzhagen, IA ;
Necula, S ;
Nonaka, M ;
Putnam, N ;
Rash, S ;
Saiga, H ;
Satake, M ;
Terry, A ;
Yamada, L ;
Wang, HG ;
Awazu, S ;
Azumi, K ;
Boore, J ;
Branno, M ;
Chin-bow, S ;
DeSantis, R ;
Doyle, S ;
Francino, P ;
Keys, DN ;
Haga, S ;
Hayashi, H ;
Hino, K ;
Imai, KS ;
Inaba, K ;
Kano, S ;
Kobayashi, K ;
Kobayashi, M ;
Lee, BI ;
Makabe, KW ;
Manohar, C ;
Matassi, G ;
Medina, M .
SCIENCE, 2002, 298 (5601) :2157-2167
[3]   A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database [J].
Dehal, Paramvir S. ;
Boore, Jeffrey L. .
BMC BIOINFORMATICS, 2006, 7 (1)
[4]   Tree pattern matching in phylogenetic trees:: automatic search for orthologs or paralogs in homologous gene sequence databases [J].
Dufayard, JF ;
Duret, L ;
Penel, S ;
Gouy, M ;
Rechenmann, F ;
Perrière, G .
BIOINFORMATICS, 2005, 21 (11) :2596-2603
[5]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[6]   An efficient algorithm for large-scale detection of protein families [J].
Enright, AJ ;
Van Dongen, S ;
Ouzounis, CA .
NUCLEIC ACIDS RESEARCH, 2002, 30 (07) :1575-1584
[7]   MSOAR: A high-throughput ortholog assignment system based on genome rearrangement [J].
Fu, Zheng ;
Chen, Xin ;
Vacic, Vladimir ;
Nan, Peng ;
Zhong, Yang ;
Jiang, Tao .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (09) :1160-1175
[8]   Genome sequence of the Brown Norway rat yields insights into mammalian evolution [J].
Gibbs, RA ;
Weinstock, GM ;
Metzker, ML ;
Muzny, DM ;
Sodergren, EJ ;
Scherer, S ;
Scott, G ;
Steffen, D ;
Worley, KC ;
Burch, PE ;
Okwuonu, G ;
Hines, S ;
Lewis, L ;
DeRamo, C ;
Delgado, O ;
Dugan-Rocha, S ;
Miner, G ;
Morgan, M ;
Hawes, A ;
Gill, R ;
Holt, RA ;
Adams, MD ;
Amanatides, PG ;
Baden-Tillson, H ;
Barnstead, M ;
Chin, S ;
Evans, CA ;
Ferriera, S ;
Fosler, C ;
Glodek, A ;
Gu, ZP ;
Jennings, D ;
Kraft, CL ;
Nguyen, T ;
Pfannkoch, CM ;
Sitter, C ;
Sutton, GG ;
Venter, JC ;
Woodage, T ;
Smith, D ;
Lee, HM ;
Gustafson, E ;
Cahill, P ;
Kana, A ;
Doucette-Stamm, L ;
Weinstock, K ;
Fechtel, K ;
Weiss, RB ;
Dunn, DM ;
Green, ED .
NATURE, 2004, 428 (6982) :493-521
[9]   Evolutionary and biomedical insights from the rhesus macaque genome [J].
Gibbs, Richard A. ;
Rogers, Jeffrey ;
Katze, Michael G. ;
Bumgarner, Roger ;
Weinstock, George M. ;
Mardis, Elaine R. ;
Remington, Karin A. ;
Strausberg, Robert L. ;
Venter, J. Craig ;
Wilson, Richard K. ;
Batzer, Mark A. ;
Bustamante, Carlos D. ;
Eichler, Evan E. ;
Hahn, Matthew W. ;
Hardison, Ross C. ;
Makova, Kateryna D. ;
Miller, Webb ;
Milosavljevic, Aleksandar ;
Palermo, Robert E. ;
Siepel, Adam ;
Sikela, James M. ;
Attaway, Tony ;
Bell, Stephanie ;
Bernard, Kelly E. ;
Buhay, Christian J. ;
Chandrabose, Mimi N. ;
Dao, Marvin ;
Davis, Clay ;
Delehaunty, Kimberly D. ;
Ding, Yan ;
Dinh, Huyen H. ;
Dugan-Rocha, Shannon ;
Fulton, Lucinda A. ;
Gabisi, Ramatu Ayiesha ;
Garner, Toni T. ;
Godfrey, Jennifer ;
Hawes, Alicia C. ;
Hernandez, Judith ;
Hines, Sandra ;
Holder, Michael ;
Hume, Jennifer ;
Jhangiani, Shalini N. ;
Joshi, Vandita ;
Khan, Ziad Mohid ;
Kirkness, Ewen F. ;
Cree, Andrew ;
Fowler, R. Gerald ;
Lee, Sandra ;
Lewis, Lora R. ;
Li, Zhangwan .
SCIENCE, 2007, 316 (5822) :222-234
[10]   Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human [J].
Goodstadt, Leo ;
Ponting, Chris P. .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (09) :1134-1150