RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs

被引:129
作者
Zmasek, CM
Eddy, SR [1 ]
机构
[1] Washington Univ, Sch Med, Howard Hughes Med Inst, St Louis, MO 63110 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
关键词
D O I
10.1186/1471-2105-3-14
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication). The utility of phylogenetic information in high-throughput genome annotation ("phylogenomics") is widely recognized, but existing approaches are either manual or not explicitly based on phylogenetic trees. Results: Here we present RIO (Resampled Inference of Orthologs), a procedure for automated phylogenomics using explicit phylogenetic inference. RIO analyses are performed over bootstrap resampled phylogenetic trees to estimate the reliability of orthology assignments. We also introduce supplementary concepts that are helpful for functional inference. RIO has been implemented as Perl pipeline connecting several C and Java programs. It is available at [http:// www.genetics.wusti.edu/eddy/forester/]. A web server is at [http://www.rio.wusti.edu/]. RIO was tested on the Arabidopsis thaliana and Coenorhabditis elegans proteomes. Conclusion: The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies. We also describe how some orthologies can be misleading for functional inference.
引用
收藏
页数:19
相关论文
共 52 条
[1]   Evidence for a clade of nematodes, arthropods and other moulting animals [J].
Aguinaldo, AMA ;
Turbeville, JM ;
Linford, LS ;
Rivera, MC ;
Garey, JR ;
Raff, RA ;
Lake, JA .
NATURE, 1997, 387 (6632) :489-493
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[4]   NUCLEOTIDE-SEQUENCE, ORGANIZATION, AND NATURE OF THE PROTEIN PRODUCTS OF THE CAROTENOID BIOSYNTHESIS GENE-CLUSTER OF RHODOBACTER-CAPSULATUS [J].
ARMSTRONG, GA ;
ALBERTI, M ;
LEACH, F ;
HEARST, JE .
MOLECULAR & GENERAL GENETICS, 1989, 216 (2-3) :254-268
[5]  
Ashburner M, 2001, GENOME RES, V11, P1425
[6]  
ATTESON K, 1997, MATH HIERARCHIES BIO, P133
[7]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[8]  
BANASZAK LJ, 1975, ENZYMES, V11, P369
[9]   Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences [J].
Barns, SM ;
Delwiche, CF ;
Palmer, JD ;
Pace, NR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) :9188-9193
[10]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]