QUANTILE DISTRIBUTIONS OF AMINO-ACID USAGE IN PROTEIN CLASSES

被引:36
作者
KARLIN, S [1 ]
BLAISDELL, BE [1 ]
BUCHER, P [1 ]
机构
[1] SWISS INST EXPTL CANC RES,CH-1066 EPALINGES,SWITZERLAND
来源
PROTEIN ENGINEERING | 1992年 / 5卷 / 08期
关键词
AMINO ACID USAGES; QUANTILE DISTRIBUTIONS; WEAK AND STRONG AMINO ACID CODON TYPES;
D O I
10.1093/protein/5.8.729
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A comparative study of the compositional properties of various protein sets from both cellular and viral organisms is presented. Invariants and contrasts of amino acid usages have been discerned for different protein function classes and for different species using robust statistical methods based on quantile distributions and stochastic ordering relationships. In addition, a quantitative criterion to assess amino acid compositional extremes relative to a reference protein set is proposed and applied. Invariants of amino acid usage relate mainly to the central range of quantile distributions, whereas contrasts occur mainly in the tails of the distributions, especially contrasts between eukaryote and prokaryote species. Influences from genomic constraint are evident, for example, in the arginine:lysine ratios and the usage frequencies of residues encoded by G + C-rich versus A + T-rich codon types. The structurally similar amino acids, glutamate versus aspartate and phenylalanine versus tyrosine show stochastic dominance relationships for most species protein sets favoring glutamate and phenylalanine respectively. The quantile distribution of hydrophobic amino acid usages in prokaryote data dominates the corresponding quantile distribution in human data. In contrast, glutamate, cysteine, proline and serine usages in human proteins dominate the corresponding quantile distributions in Escherichia coli. E. coli dominates human in the use of basic residues, but no dominance ordering applies to acidic residues. The discussion centers on commonalities and anomalies of the amino acid compositional spectrum in relation to species, function, cellular localization, biochemical and steric attributes, complexity of the amino acid biosynthetic pathway, amino acid relative abundances and founder effects.
引用
收藏
页码:729 / 738
页数:10
相关论文
共 33 条
  • [1] AISSANI B, 1991, J MOL EVOL, V32, P493, DOI 10.1007/BF02102651
  • [2] Alberts B., 1983, MOL BIOL CELL
  • [3] THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK
    BAIROCH, A
    BOECKMANN, B
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 : 2247 - 2248
  • [4] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE
    BOWIE, JU
    LUTHY, R
    EISENBERG, D
    [J]. SCIENCE, 1991, 253 (5016) : 164 - 170
  • [5] BRANDEN C, 1991, INTRO PROTEIN STRUCT
  • [6] PROSET - A FAST PROCEDURE TO CREATE NONREDUNDANT SETS OF PROTEIN SEQUENCES
    BRENDEL, V
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 1992, 16 (6-7) : 37 - 43
  • [7] PREDICTING DNA DUPLEX STABILITY FROM THE BASE SEQUENCE
    BRESLAUER, KJ
    FRANK, R
    BLOCKER, H
    MARKY, LA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1986, 83 (11) : 3746 - 3750
  • [8] BUCHER P, 1991, DNA SEQUENCE, V1, P159
  • [9] OVER-REPRESENTATION AND UNDER-REPRESENTATION OF SHORT OLIGONUCLEOTIDES IN DNA-SEQUENCES
    BURGE, C
    CAMPBELL, AM
    KARLIN, S
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (04) : 1358 - 1362
  • [10] CHERRY M, 1991, TABLES CODON FREQUEN