Statistical analysis of the DNA sequence of human chromosome 22

被引:43
作者
Holste, D
Grosse, I
Herzel, H
机构
[1] Humboldt Univ, Dept Theoret Biophys, D-10115 Berlin, Germany
[2] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[3] Humboldt Univ, Inst Theoret Biol, D-10115 Berlin, Germany
来源
PHYSICAL REVIEW E | 2001年 / 64卷 / 04期
关键词
D O I
10.1103/PhysRevE.64.041917
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
We study statistical patterns in the DNA sequence of human chromosome 22, the first completely sequenced human chromosome. We find that (i) the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law correlations over more than four orders of magnitude, (ii) the entropies H-n of the frequency distribution of oligonucleotides of length n (n-mers) grow sublinearly with increasing n, indicating the presence of higher-order correlations for all of the studied lengths 1 less than or equal to n less than or equal to 10, and (iii) the generalized entropies H-n(q) of n-mers decrease monotonically with increasing q and the decay of H-n(q) with q becomes steeper with increasing n less than or equal to 10, indicating that the frequency distribution of oligonucleotides becomes increasingly nonuniform as the length n increases. We investigate to what degree known biological features may explain the observed statistical patterns. We find that (iv) the presence of interspersed repeats may cause the sublinear increase of H-n with n, and that (v) the presence of monomeric tandem repeats as well as the suppression of CG dinucleotides may cause the observed decay of H-n(q) with q.
引用
收藏
页数:9
相关论文
共 75 条
[1]  
Alberts B., 1994, MOL BIOL CELL
[2]  
[Anonymous], 2000, PHYS-USP+, DOI DOI 10.1070/PU2000V043N01ABEH000611
[3]  
Basharin G. P., 1959, THEOR PROBAB APPL, V4, P333
[4]  
Beck C., 1993, THERMODYNAMICS CHAOT
[5]   SPLICE JUNCTIONS FOLLOW A 205-BASE LADDER [J].
BECKMANN, JS ;
TRIFONOV, EN .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (06) :2380-2383
[6]   REPETITIVE DNA-SEQUENCES - SOME CONSIDERATIONS FOR SIMPLE SEQUENCE REPEATS [J].
BELL, GI ;
TORNEY, DC .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :185-190
[7]   Compositional segmentation and long-range fractal correlations in DNA sequences [J].
BernaolaGalvan, P ;
RomanRoldan, R ;
Oliver, JL .
PHYSICAL REVIEW E, 1996, 53 (05) :5181-5189
[8]   THE MOSAIC GENOME OF WARM-BLOODED VERTEBRATES [J].
BERNARDI, G ;
OLOFSSON, B ;
FILIPSKI, J ;
ZERIAL, M ;
SALINAS, J ;
CUNY, G ;
MEUNIERROTIVAL, M ;
RODIER, F .
SCIENCE, 1985, 228 (4702) :953-958
[9]   No signs of hidden language in noncoding DNA [J].
Bonhoeffer, S ;
Herz, AVM ;
Boerlijst, MC ;
Nee, S ;
Nowak, MA ;
May, RM .
PHYSICAL REVIEW LETTERS, 1996, 76 (11) :1977-1977
[10]   LINGUISTICS OF NUCLEOTIDE-SEQUENCES - MORPHOLOGY AND COMPARISON OF VOCABULARIES [J].
BRENDEL, V ;
BECKMANN, JS ;
TRIFONOV, EN .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1986, 4 (01) :11-21