power law distributions;
long-range correlations;
coding/non-coding DNA sequences;
DNA strand partition;
D O I:
10.1023/A:1004671119400
中图分类号:
O4 [物理学];
学科分类号:
0702 ;
摘要:
We study the size distribution of coding and non-coding regions in DNA sequences. For most organisms we observe that the size distribution P-c(S) of the coding regions of size S shows short range distribution, whereas the size distribution of the non-coding regions follows a power-law decay P-nc(S)similar to S-1-mu with power exponents indicating clear long-range behavior. We argue, using the Generalized Central Limit Theorem, that the long-range distributions observed in the non-coding are related to the lower level clustering of purines and pyrimidines (1d islands) which follow similar long-range laws. We also address the question of clustering of coding segments in the two complementary strands of DNA. We observe a short-range clustering of coding regions in both strands, expressed by an exponential decay in the clustering size distribution. The decay exponent expresses the degree of short-range correlations and the deviation from random clustering.