GRAPHIC ANALYSIS OF CODON USAGE STRATEGY IN 1490 HUMAN PROTEINS

被引:50
作者
ZHANG, CT [1 ]
CHOU, KC [1 ]
机构
[1] UPJOHN CO,RES LABS,KALAMAZOO,MI 49001
来源
JOURNAL OF PROTEIN CHEMISTRY | 1993年 / 12卷 / 03期
关键词
CODON POSITION; DNA BASES; CHARACTERISTIC INEQUALITY; MAPPING POINT;
D O I
10.1007/BF01028195
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The frequencies of bases A (adenine), C (cytosine), G (guanine), and T (thymine) occurring in codon position i, denoted by a(i), c(i), g(i), and t(i), respectively (i = 1, 2, 3), have been calculated and diagrammatized for the 1490 human proteins in the codon usage table for primate genes compiled recently. Based on the characteristic graphs thus obtained, an overall picture of codon base distribution has been provided, and the relevant biological implication discussed. For the first codon position, it is shown in most cases that G is the most dominant base, and that the relationship g1 > a1 > c1 > t1 generally holds true. For the second codon position, A is generally the most dominant base and G is the one with the least occurrence frequently, with the relationship of a2 > t2 > c2 > g2. As to the third codon position, the values Of 93 + C3 vary from 0.27 to 1, roughly keeping the relationship of c3 > g3 > a3 = t3 for the majority of cases. Interestingly, if the average frequencies for bases A, C, G, and T are defined as aBAR = (a1 + a2 + a3)13, cBAR = (c1 + c2 + C3)13, gBAR = (g1 + g2 + 93)/3, and tBAR = (t1 + t2 + t3)/3, respectively, we find that a2BAR + c2BAR + g2BAR + t2BAR < 1/3 is valid almost without exception. Such a characteristic inequality might reflect some inherent rule of codon usage, although its biological implications is unclear. An important advantage by introducing graphic methods is to make it possible to catch essential features from a huge amount of data by a direct and intuitive examination. The method used here allows one to see means and variances, and also spot outliers. This is particularly useful for finding and classifying similarity patterns and relationships in data sets of long sequences, such as DNA coding sequences. The current method also holds a great potential for the study of molecular evolution from the viewpoint of genetic code whose data have been accumulated rapidly and are to continue growth at a much faster pace.
引用
收藏
页码:329 / 335
页数:7
相关论文
共 9 条
[1]   CODON USAGE TABULATED FROM THE GENBANK GENETIC SEQUENCE DATA [J].
AOTA, S ;
GOJOBORI, T ;
ISHIBASHI, F ;
MARUYAMA, T ;
IKEMURA, T .
NUCLEIC ACIDS RESEARCH, 1988, 16 :R315-R402
[2]  
GRANTHAM R, 1981, NUCLEIC ACIDS RES, V9, pR43
[3]   CODON CATALOG USAGE AND THE GENOME HYPOTHESIS [J].
GRANTHAM, R ;
GAUTIER, C ;
GOUY, M ;
MERCIER, R ;
PAVE, A .
NUCLEIC ACIDS RESEARCH, 1980, 8 (01) :R49-R62
[4]   WORKINGS OF THE GENETIC-CODE [J].
GRANTHAM, R .
TRENDS IN BIOCHEMICAL SCIENCES, 1980, 5 (12) :327-331
[5]   EVIDENT DIVERSITY OF CODON USAGE PATTERNS OF HUMAN GENES WITH RESPECT TO CHROMOSOME-BANDING PATTERNS AND CHROMOSOME-NUMBERS - RELATION BETWEEN NUCLEOTIDE-SEQUENCE DATA AND CYTOGENETIC DATA [J].
IKEMURA, T ;
WADA, K .
NUCLEIC ACIDS RESEARCH, 1991, 19 (16) :4333-4339
[6]  
IKEMURA T, 1985, MOL BIOL EVOL, V2, P13
[7]   CODON USAGE TABULATED FROM THE GENBANK GENETIC SEQUENCE DATA [J].
MARUYAMA, T ;
GOJOBORI, T ;
AOTA, S ;
IKEMURA, T .
NUCLEIC ACIDS RESEARCH, 1986, 14 :R151-R197
[8]   CODON USAGE IN PLANT GENES [J].
MURRAY, EE ;
LOTZER, J ;
EBERLE, M .
NUCLEIC ACIDS RESEARCH, 1989, 17 (02) :477-498
[9]  
WATA K, 1990, NUCLEIC ACIDS RES, V18, P2367