Using homolog groups to create a whole-genomic tree of free-living organisms: An update

被引:68
作者
House, CH
Fitz-Gibbon, ST
机构
[1] Penn State Univ, Penn State Astrobiol Res Ctr, University Pk, PA 16802 USA
[2] Penn State Univ, Dept Geosci, University Pk, PA 16802 USA
[3] Univ Calif Los Angeles, IGPP Ctr Astrobiol, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Microbiol & Mol Genet, Los Angeles, CA 90095 USA
关键词
homologs; tree of life; genome; archaea; bacteria;
D O I
10.1007/s00239-001-0054-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genomic trees have been constructed based on the presence and absence of families of protein-encoding genes observed in 27 complete genomes, including genomes of 15 free-living organisms. This method does not rely on the identification of suspected orthologs in each genome, nor the specific alignment used to compare gene sequences because the protein-encoding gene families are formed by grouping any protein with a pairwise similarity score greater than a preset value. Because of this all inclusive grouping. this method is resilient to some effects of lateral gene transfer because transfers of genes are masked when the recipient genome already has a homolog (not necessarily an ortholog) of the incoming gene. Of 71 genes suspected to have been laterally transferred to the genome of Aeropyrum pernix, only approximately 7 to 15 represent genes where a lateral gene transfer appears to have generated homoplasy in our character dataset. The genomic tree of the 15 free-living taxa includes six different bacterial orders, six different archaeal orders, and two different eukaryotic kingdoms. The results are remarkably similar to results obtained by analysis of rRNA. Inclusion of the other 12 genomes resulted in a tree only broadly similar to that suggested by rRNA with at least some of the differences due to artifacts caused by the small genome size of many of these species. Very small genomes, such as those of the two Mycoplasma genomes included, fall to the base of the Bacterial domain, a result expected due to the substantial gene loss inherent to these lineages. Finally, artificial "partial genomes" were generated by randomly selecting ORFs from the complete genomes in order to test our ability to recover the tree generated by the whole genome sequences when only partial data axe available. The results indicated that partial genomic data, when sampled randomly, could robustly recover the tree,generated by the whole genome sequences.
引用
收藏
页码:539 / 547
页数:9
相关论文
共 44 条
[1]   Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori [J].
Alm, RA ;
Ling, LSL ;
Moir, DT ;
King, BL ;
Brown, ED ;
Doig, PC ;
Smith, DR ;
Noonan, B ;
Guild, BC ;
deJonge, BL ;
Carmel, G ;
Tummino, PJ ;
Caruso, A ;
Uria-Nickelsen, M ;
Mills, DM ;
Ives, C ;
Gibson, R ;
Merberg, D ;
Mills, SD ;
Jiang, Q ;
Taylor, DE ;
Vovis, GF ;
Trost, TJ .
NATURE, 1999, 397 (6715) :176-180
[2]   The genome sequence of Rickettsia prowazekii and the origin of mitochondria [J].
Andersson, SGE ;
Zomorodipour, A ;
Andersson, JO ;
Sicheritz-Pontén, T ;
Alsmark, UCM ;
Podowski, RM ;
Näslund, AK ;
Eriksson, AS ;
Winkler, HH ;
Kurland, CG .
NATURE, 1998, 396 (6707) :133-140
[3]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[4]  
BREMER K, 1988, EVOLUTION, V42, P795, DOI [10.2307/2408870, 10.1111/j.1558-5646.1988.tb02497.x]
[5]   Universal trees based on large combined protein sequence data sets [J].
Brown, JR ;
Douady, CJ ;
Italia, MJ ;
Marshall, WE ;
Stanhope, MJ .
NATURE GENETICS, 2001, 28 (03) :281-285
[6]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073
[7]   Genome sequence of the nematode C-elegans:: A platform for investigating biology [J].
不详 .
SCIENCE, 1998, 282 (5396) :2012-2018
[8]   Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence [J].
Cole, ST ;
Brosch, R ;
Parkhill, J ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Gordon, SV ;
Eiglmeier, K ;
Gas, S ;
Barry, CE ;
Tekaia, F ;
Badcock, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, R ;
Devlin, K ;
Feltwell, T ;
Gentles, S ;
Hamlin, N ;
Holroyd, S ;
Hornby, T ;
Jagels, K ;
Krogh, A ;
McLean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Osborne, J ;
Quail, MA ;
Rajandream, MA ;
Rogers, J ;
Rutter, S ;
Seeger, K ;
Skelton, J ;
Squares, R ;
Squares, S ;
Sulston, JE ;
Taylor, K ;
Whitehead, S ;
Barrell, BG .
NATURE, 1998, 393 (6685) :537-+
[9]   The complete genome of the hyperthermophilic bacterium Aquifex aeolicus [J].
Deckert, G ;
Warren, PV ;
Gaasterland, T ;
Young, WG ;
Lenox, AL ;
Graham, DE ;
Overbeek, R ;
Snead, MA ;
Keller, M ;
Aujay, M ;
Huber, R ;
Feldman, RA ;
Short, JM ;
Olsen, GJ ;
Swanson, RV .
NATURE, 1998, 392 (6674) :353-358
[10]   Assessing evolutionary relationships among microbes from whole-genome analysis [J].
Eisen, JA .
CURRENT OPINION IN MICROBIOLOGY, 2000, 3 (05) :475-480