LONG-RANGE CORRELATION-PROPERTIES OF CODING AND NONCODING DNA-SEQUENCES - GENBANK ANALYSIS

被引:519
作者
BULDYREV, SV
GOLDBERGER, AL
HAVLIN, S
MANTEGNA, RN
MATSA, ME
PENG, CK
SIMONS, M
STANLEY, HE
机构
[1] BOSTON UNIV,DEPT PHYS,BOSTON,MA 02215
[2] HARVARD UNIV,BETH ISRAEL HOSP,SCH MED,DIV CARDIOVASC,BOSTON,MA 02215
[3] BOSTON UNIV,DEPT BIOMED ENGN,BOSTON,MA 02215
[4] BAR ILAN UNIV,DEPT PHYS,RAMAT GAN,ISRAEL
[5] UNIV PALERMO,DIPARTIMENTO ENERGET & APPLICAZ FIS,I-90128 PALERMO,ITALY
来源
PHYSICAL REVIEW E | 1995年 / 51卷 / 05期
关键词
D O I
10.1103/PhysRevE.51.5084
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33 301 coding and all 29 453 noncoding eukaryotic sequences-each of length larger than 512 base pairs (bp-in the present release of the GenBank to determine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent β=0.00±0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent β is positive (0.16±0.05), which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10-10. We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure (''patchiness'') arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion. © 1995 The American Physical Society.
引用
收藏
页码:5084 / 5091
页数:8
相关论文
共 24 条
  • [1] STATISTICAL-ANALYSIS OF DNA-SEQUENCES .1.
    AZBEL, MY
    KANTOR, Y
    VERKH, L
    VILENKIN, A
    [J]. BIOPOLYMERS, 1982, 21 (08) : 1687 - 1690
  • [2] RANDOM 2 COMPONENT 1 DIMENSIONAL ISING-MODEL FOR HETEROPOLYMER MELTING
    AZBEL, MY
    [J]. PHYSICAL REVIEW LETTERS, 1973, 31 (09) : 589 - 592
  • [3] GLOBAL FRACTAL DIMENSION OF HUMAN DNA-SEQUENCES TREATED AS PSEUDORANDOM WALKS
    BERTHELSEN, CL
    GLAZIER, JA
    SKOLNICK, MH
    [J]. PHYSICAL REVIEW A, 1992, 45 (12) : 8902 - 8913
  • [4] FRACTALITY OF DNA TEXTS
    BOROVK, AS
    FRANKKAMENETSKII, MD
    GROSBERG, AY
    [J]. JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1994, 12 (03) : 655 - 669
  • [5] FRACTAL LANDSCAPES AND MOLECULAR EVOLUTION - MODELING THE MYOSIN HEAVY-CHAIN GENE FAMILY
    BULDYREV, SV
    GOLDBERGER, AL
    HAVLIN, S
    PENG, CK
    STANLEY, HE
    STANLEY, MHR
    SIMONS, M
    [J]. BIOPHYSICAL JOURNAL, 1993, 65 (06) : 2673 - 2679
  • [6] GENERALIZED LEVY-WALK MODEL FOR DNA NUCLEOTIDE-SEQUENCES
    BULDYREV, SV
    GOLDBERGER, AL
    HAVLIN, S
    PENG, CK
    SIMONS, M
    STANLEY, HE
    [J]. PHYSICAL REVIEW E, 1993, 47 (06): : 4514 - 4523
  • [7] BULDYREV SV, 1994, FRACTALS SCI, pCH2
  • [8] ASSESSMENT OF PROTEIN CODING MEASURES
    FICKETT, JW
    TUNG, CS
    [J]. NUCLEIC ACIDS RESEARCH, 1992, 20 (24) : 6441 - 6450
  • [9] CRUMPLED GLOBULE MODEL OF THE 3-DIMENSIONAL STRUCTURE OF DNA
    GROSBERG, A
    RABIN, Y
    HAVLIN, S
    NEER, A
    [J]. EUROPHYSICS LETTERS, 1993, 23 (05): : 373 - 378
  • [10] HAVLIN S, IN PRESS FRACTALS