Phylogeny determined by protein domain content

被引:170
作者
Yang, S
Doolittle, RF
Bourne, PE [1 ]
机构
[1] Univ Calif San Diego, Dept Pharmacol, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, San Diego Supercomp Ctr, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Dept Chem & Biochem, La Jolla, CA 92093 USA
关键词
fold superfamily;
D O I
10.1073/pnas.0408810102
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A simple classification scheme that uses only the presence or absence of a protein domain architecture has been used to determine the phylogeny of 174 complete genomes. The method correctly divides the 174 taxa into Archaea, Bacteria, and Eukarya and satisfactorily sorts most of the major groups within these superkingdoms. The most challenging problem involved 119 Bacteria, many of which have reduced genomes. When a weighting factor was used that takes account of difference in genome size (number of considered folds), small-genome taxa were mostly grouped with their full-sized counterparts. Although not every organism appears exactly at its classical phylogenetic position in these trees, the agreement appears comparable with the efforts of others by using sophisticated sequence analysis and/or combinations of gene content and gene order. During the course of the study, it emerged that there is a core set of approximate to50 folds that is found in all 174 genomes and a single fold diagnostic of all Archaea.
引用
收藏
页码:373 / 378
页数:6
相关论文
共 41 条
[1]   A kingdom-level phylogeny of eukaryotes based on combined protein data [J].
Baldauf, SL ;
Roger, AJ ;
Wenk-Siefert, I ;
Doolittle, WF .
SCIENCE, 2000, 290 (5493) :972-977
[2]   Evolutionary analysis by whole-genome comparisons [J].
Bansal, AK ;
Meyer, TE .
JOURNAL OF BACTERIOLOGY, 2002, 184 (08) :2260-2272
[3]   Phylogenetic reconstruction and lateral gene transfer [J].
Bapteste, E ;
Boucher, Y ;
Leigh, J ;
Doolittle, WF .
TRENDS IN MICROBIOLOGY, 2004, 12 (09) :406-411
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Universal trees based on large combined protein sequence data sets [J].
Brown, JR ;
Douady, CJ ;
Italia, MJ ;
Marshall, WE ;
Stanhope, MJ .
NATURE GENETICS, 2001, 28 (03) :281-285
[6]   An evolutionarily structured universe of protein architecture [J].
Caetano-Anollés, G ;
Caetano-Anollés, D .
GENOME RESEARCH, 2003, 13 (07) :1563-1571
[7]   Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores [J].
Clarke, GDP ;
Beiko, RG ;
Ragan, MA ;
Charlebois, RL .
JOURNAL OF BACTERIOLOGY, 2002, 184 (08) :2072-2080
[8]   Conservation of gene order: a fingerprint of proteins that physically interact [J].
Dandekar, T ;
Snel, B ;
Huynen, M ;
Bork, P .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (09) :324-328
[9]   Proteomic traces of speciation [J].
Deeds, EJ ;
Shakhnovich, B ;
Shakhnovich, EI .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 336 (03) :695-706
[10]   THE MULTIPLICITY OF DOMAINS IN PROTEINS [J].
DOOLITTLE, RF .
ANNUAL REVIEW OF BIOCHEMISTRY, 1995, 64 :287-314