Statistical analysis of the DNA sequence of human chromosome 22

被引:43
作者
Holste, D
Grosse, I
Herzel, H
机构
[1] Humboldt Univ, Dept Theoret Biophys, D-10115 Berlin, Germany
[2] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[3] Humboldt Univ, Inst Theoret Biol, D-10115 Berlin, Germany
来源
PHYSICAL REVIEW E | 2001年 / 64卷 / 04期
关键词
D O I
10.1103/PhysRevE.64.041917
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
We study statistical patterns in the DNA sequence of human chromosome 22, the first completely sequenced human chromosome. We find that (i) the 33.4 x 10(6) nucleotide long human chromosome exhibits long-range power-law correlations over more than four orders of magnitude, (ii) the entropies H-n of the frequency distribution of oligonucleotides of length n (n-mers) grow sublinearly with increasing n, indicating the presence of higher-order correlations for all of the studied lengths 1 less than or equal to n less than or equal to 10, and (iii) the generalized entropies H-n(q) of n-mers decrease monotonically with increasing q and the decay of H-n(q) with q becomes steeper with increasing n less than or equal to 10, indicating that the frequency distribution of oligonucleotides becomes increasingly nonuniform as the length n increases. We investigate to what degree known biological features may explain the observed statistical patterns. We find that (iv) the presence of interspersed repeats may cause the sublinear increase of H-n with n, and that (v) the presence of monomeric tandem repeats as well as the suppression of CG dinucleotides may cause the observed decay of H-n(q) with q.
引用
收藏
页数:9
相关论文
共 75 条
[11]  
BUCHER P, 1991, DNA SEQUENCE, V1, P27
[12]   FRACTAL LANDSCAPES AND MOLECULAR EVOLUTION - MODELING THE MYOSIN HEAVY-CHAIN GENE FAMILY [J].
BULDYREV, SV ;
GOLDBERGER, AL ;
HAVLIN, S ;
PENG, CK ;
STANLEY, HE ;
STANLEY, MHR ;
SIMONS, M .
BIOPHYSICAL JOURNAL, 1993, 65 (06) :2673-2679
[13]   LONG-RANGE CORRELATION-PROPERTIES OF CODING AND NONCODING DNA-SEQUENCES - GENBANK ANALYSIS [J].
BULDYREV, SV ;
GOLDBERGER, AL ;
HAVLIN, S ;
MANTEGNA, RN ;
MATSA, ME ;
PENG, CK ;
SIMONS, M ;
STANLEY, HE .
PHYSICAL REVIEW E, 1995, 51 (05) :5084-5091
[14]   Potential Alu function: Regulation of the activity of double-stranded RNA-activated kinase PKR [J].
Chu, WM ;
Ballard, R ;
Carpick, BW ;
Williams, BRG ;
Schmid, CW .
MOLECULAR AND CELLULAR BIOLOGY, 1998, 18 (01) :58-68
[15]  
Claverie JM, 1996, METHOD ENZYMOL, V266, P212
[16]   A SNP resource for human chromosome 22: Extracting dense clusters of SNPs from the genomic sequence [J].
Dawson, E ;
Chen, Y ;
Hunt, S ;
Smink, LJ ;
Hunt, A ;
Rice, K ;
Livingston, S ;
Bumpstead, S ;
Bruskiewich, R ;
Sham, P ;
Ganske, R ;
Adams, M ;
Kawasaki, K ;
Shimizu, N ;
Minoshima, S ;
Roe, B ;
Bentley, D ;
Dunham, I .
GENOME RESEARCH, 2001, 11 (01) :170-178
[17]   Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags [J].
de Souza, SJ ;
Camargo, AA ;
Briones, MRS ;
Costa, FF ;
Nagai, MA ;
Verjovski-Almeida, S ;
Zago, MA ;
Andrade, LEC ;
Carrer, H ;
El-Dorry, HFA ;
Espreafico, EM ;
Habr-Gama, A ;
Giannella-Neto, D ;
Goldman, GH ;
Gruber, A ;
Hackel, C ;
Kimura, ET ;
Maciel, RMB ;
Marie, SKN ;
Martins, EAL ;
Nóbrega, MP ;
Pacó-Larson, ML ;
Pardini, MIMC ;
Pereira, GG ;
Pesquero, JB ;
Rodrigues, V ;
Rogatto, SR ;
da Silva, IDCG ;
Sogayar, MC ;
Sonati, MD ;
Tajara, EH ;
Valentini, SR ;
Acencio, M ;
Alberto, FL ;
Amaral, MEJ ;
Aneas, I ;
Bengtson, MH ;
Carraro, DM ;
Carvalho, AF ;
Carvalho, LH ;
Cerutti, JM ;
Corrêa, MLC ;
Costa, MCR ;
Curcio, C ;
Gushiken, T ;
Ho, PL ;
Kimura, E ;
Leite, LCC ;
Maia, G ;
Majumder, P .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (23) :12690-12693
[18]   Distribution of base pair repeats in coding and noncoding DNA sequences [J].
Dokholyan, NV ;
Buldyrev, SV ;
Havlin, S ;
Stanley, HE .
PHYSICAL REVIEW LETTERS, 1997, 79 (25) :5182-5185
[19]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[20]  
Durbin R., 1998, BIOL SEQUENCE ANAL