Interpreting correlations in biosequences

被引:84
作者
Herzel, H
Trifonov, EN
Weiss, O
Grosse, I
机构
[1] Humboldt Univ, Inst Theoret Biol, D-10115 Berlin, Germany
[2] Weizmann Inst Sci, Dept Biol Struct, IL-76100 Rehovot, Israel
[3] Boston Univ, Ctr Polymer Studies, Boston, MA 02215 USA
[4] Boston Univ, Dept Phys, Boston, MA 02215 USA
来源
PHYSICA A | 1998年 / 249卷 / 1-4期
关键词
correlation function; DNA sequence; genetic code; protein sequence; hydrophobicity;
D O I
10.1016/S0378-4371(97)00505-0
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Understanding the complex organization of genomes as well as predicting the location of genes and the possible structure of the gene products are some of the most important problems in current molecular biology. Many statistical techniques are used to address these issues. A central role among them play correlation functions. This paper is based on an analysis of the decay of the entire 4 x 4 dimensional covariance matrix of DNA sequences. We apply this covariance analysis to human chromosomal regions, yeast DNA, and bacterial genomes and interpret the three most pronounced statistical features - long-range correlations, a period 3, and a period 10-11 - using known biological facts about the structure of genomes. For example, we relate the slowly decaying long-range G+C correlations to dispersed repeats and CpG islands. We show quantitatively that the 3-basepair-periodicity is due to the nonuniformity of the codon usage in protein coding segments. We finally show that periodicities of 10-11 basepairs in yeast DNA originate from an alternation of hydrophobic and hydrophilic amino acids in protein sequences. (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:449 / 459
页数:11
相关论文
共 40 条
[1]   THE MOSAIC GENOME OF WARM-BLOODED VERTEBRATES [J].
BERNARDI, G ;
OLOFSSON, B ;
FILIPSKI, J ;
ZERIAL, M ;
SALINAS, J ;
CUNY, G ;
MEUNIERROTIVAL, M ;
RODIER, F .
SCIENCE, 1985, 228 (4702) :953-958
[2]   COMPOSITIONAL MAPPING OF THE HUMAN DYSTROPHIN-ENCODING GENE [J].
BETTECKEN, T ;
AISSANI, B ;
MULLER, CR ;
BERNARDI, G .
GENE, 1992, 122 (02) :329-335
[3]  
Bolshoy A, 1996, COMPUT APPL BIOSCI, V12, P383
[4]   PERIODICITY OF 8 NUCLEOTIDES IN PURINE DISTRIBUTION AROUND HUMAN GENOMIC CPG DINUCLEOTIDES [J].
CLAY, O ;
SCHAFFNER, W ;
MATSUO, K .
SOMATIC CELL AND MOLECULAR GENETICS, 1995, 21 (02) :91-98
[5]  
Creighton T.E., 1993, PROTEINS STRUCTURE M, V2nd
[6]  
DENISOV DA, UNPUB GENE
[7]   DYNAMICS AND COMPLEXITY OF BIOMOLECULES [J].
EBELING, W ;
FEISTEL, R ;
HERZEL, H .
PHYSICA SCRIPTA, 1987, 35 (05) :761-768
[8]   THEORETICAL MODELS FOR HETEROGENEITY OF BASE COMPOSITION IN DNA [J].
ELTON, RA .
JOURNAL OF THEORETICAL BIOLOGY, 1974, 45 (02) :533-553
[9]   BASE COMPOSITIONAL STRUCTURE OF GENOMES [J].
FICKETT, JW ;
TORNEY, DC ;
WOLF, DR .
GENOMICS, 1992, 13 (04) :1056-1064
[10]   RECOGNITION OF PROTEIN CODING REGIONS IN DNA-SEQUENCES [J].
FICKETT, JW .
NUCLEIC ACIDS RESEARCH, 1982, 10 (17) :5303-5318