Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis

被引:148
作者
Tekaia, F
Yeramian, E
Dujon, B
机构
[1] Inst Pasteur, URA 2171 CNRS, Unite Genet Mol Levures, F-75724 Paris 15, France
[2] Univ Paris 06, Inst Pasteur, UFR927, F-75724 Paris 15, France
[3] Inst Pasteur, Ctr Bioinformat, F-75724 Paris 15, France
关键词
hyperthermophiles; mesophiles; thermostability; amino acid composition; evolution; multivariate analyses;
D O I
10.1016/S0378-1119(02)00871-5
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Can we infer the lifestyle of an organism from the characteristic properties of its genome? More precisely, what are the relations between easily quantifiable properties from genomic sequences, such as amino-acid compositions, and more subtle characteristics concerning for example lifestyles or evolutionary trends? Here, we seek a global picture for such properties, based on a large number (56) of complete genomes, including significant numbers of representatives from the three domains of life. We consider the amino acid compositions of the predicted proteomes, and we use correspondence analysis, as a multivariate method to extract the relevant information from the large-scale data. From these analyses we derive a series of conclusions, concerning lifestyles, as well as physico-chemical and evolutionary trends: (1) correspondence analysis of the amino acid compositions permits discrimination between the three known lifestyles (mesophily/thermophily/ hyperthermophily). (2) For various organisms, amino-acid composition properties are essentially driven by GC content, and to a significantly lesser extent by growth temperatures associated with lifestyles. Roughly speaking, the respective contributions of these two components are 57 and 20%. It is notable that these proportions are essentially unchanged with respect to a previous analysis (Nature 393 (1998) 537), which involved only 15 genomes, available at the time. (3) In terms of amino acid compositional biases, two specific 'signatures' for thermophily (in a broad sense, including hyperthermophily) can be detected. First, thermophilic species display a relative abundance in glutamic acid (Glu), concomitantly with the depletion in glutamine. Second, in thermophilic species, the relative abundance in Glu (negative charge) is significantly correlated (Pearson correlation coefficient r = 0.83 with P < 0.0001), with the increase in the lumped 'pool' lysine + arginine (positive charges). This correlation (absent in mesophiles) could be interpreted on a physico-chemical basis, relevant to the thermostability of proteins. (4) Statistically significant differences are observed between the average lengths of the genes in the surveyed species, which follow their distribution between the three domains of life. Also a significant difference is observed between the average lengths of thermophilic (283.0 +/- 5.8) versus mesophilic (340 +/- 9.4) genes. It is thus possible that the 'general' shortening of the primary sequences in thermophilic proteins plays a role in thermostability. (5) Considering various combinations of conservation properties (genes conserved exclusively in eukaryotes, in archaea, in bacteria, in combinations of two domains, etc.) correspondence analysis reveals a trend towards thermophilic-hyperthermophilic profiles for the most conserved subset of genes (ancient genes). (6) When limited to the subset of species-specific genes, correspondence analysis leads to a different picture for the clustering of genomes following amino-acid compositions: for example, the 'core' specific part of a genome can bear lifestyle signatures different from those of the complete genome. Various results are discussed both on methodological and biological grounds. The evolutionary perspectives opened by our analyses are noted. (C) 2002 Published by Elsevier Science B.V.
引用
收藏
页码:51 / 60
页数:10
相关论文
共 30 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   THERMAL-STABILITY AND PROTEIN-STRUCTURE [J].
ARGOS, P ;
ROSSMANN, MG ;
GRAU, UM ;
ZUBER, H ;
FRANK, G ;
TRATSCHIN, JD .
BIOCHEMISTRY, 1979, 18 (25) :5698-5703
[3]  
Benzecri J-P, 1973, ANAL CORRESPONDANCES, VII
[4]   Structural and genomic correlates of hyperthermostability [J].
Cambillau, C ;
Claverie, JM .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2000, 275 (42) :32383-32386
[5]   Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence [J].
Cole, ST ;
Brosch, R ;
Parkhill, J ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Gordon, SV ;
Eiglmeier, K ;
Gas, S ;
Barry, CE ;
Tekaia, F ;
Badcock, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, R ;
Devlin, K ;
Feltwell, T ;
Gentles, S ;
Hamlin, N ;
Holroyd, S ;
Hornby, T ;
Jagels, K ;
Krogh, A ;
McLean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Osborne, J ;
Quail, MA ;
Rajandream, MA ;
Rogers, J ;
Rutter, S ;
Seeger, K ;
Skelton, J ;
Squares, R ;
Squares, S ;
Sulston, JE ;
Taylor, K ;
Whitehead, S ;
Barrell, BG .
NATURE, 1998, 393 (6685) :537-+
[6]   The complete genome of the hyperthermophilic bacterium Aquifex aeolicus [J].
Deckert, G ;
Warren, PV ;
Gaasterland, T ;
Young, WG ;
Lenox, AL ;
Graham, DE ;
Overbeek, R ;
Snead, MA ;
Keller, M ;
Aujay, M ;
Huber, R ;
Feldman, RA ;
Short, JM ;
Olsen, GJ ;
Swanson, RV .
NATURE, 1998, 392 (6674) :353-358
[7]   Psychrophiles and polar regions [J].
Deming, JW .
CURRENT OPINION IN MICROBIOLOGY, 2002, 5 (03) :301-309
[8]   Correspondence analysis applied to microarray data [J].
Fellenberg, K ;
Hauser, NC ;
Brors, B ;
Neutzner, A ;
Hoheisel, JD ;
Vingron, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (19) :10781-10786
[9]   A hot topic: The origin of hyperthermophiles [J].
Forterre, P .
CELL, 1996, 85 (06) :789-792
[10]   A nonhyperthermophilic common ancestor to extant life forms [J].
Galtier, N ;
Tourasse, N ;
Gouy, M .
SCIENCE, 1999, 283 (5399) :220-221