Sequence analysis by additive scales:: DNA structure for sequences and repeats of all lengths

被引:37
作者
Baldi, P [1 ]
Baisnée, PF
机构
[1] Univ Calif Irvine, Coll Med, Dept Informat & Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Coll Med, Dept Biol Chem, Irvine, CA 92697 USA
关键词
D O I
10.1093/bioinformatics/16.10.865
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: DNA structure plays an important role in a variety of biological processes. Different di- and trinucleotide scales have been proposed to capture various aspects of DNA structure including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference, Yet, a general framework for the computational analysis and prediction of DNA structure is still lacking. Such a framework Should in particular address the following issues: (1) construction of sequences with external properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length N of the sequences increases; and (5) complete analysis of correlations between scales. Results: We develop a general framework for sequence analysis based on additive scales, structural or other that addresses all these issues. WE show how to construct extremal sequences and calibrate scores for automatic genomic and database extraction. We show that distributions rapidly converge to normality as N increases, Pairwise correlations between scales depend both on background distribution and sequence length and rapidly converge to an analytically predictable asymptotic value. For di- and tri-nucleotide scales, normal behavior and asymptotic correlation values are attained over a characteristic window length of about 10-15 bp. With a uniform background distribution, pairwise correlations between empirically-derived scales remain relatively small and roughly constant at all lengths, except for propeller twist and protein deformability which are positively correlated There is a positive (resp. negative) correlation between dinucleotide base stacking (resp, propeller twist and protein deformability) and AT-content that increases in magnitude with length. The framework is applied to the analysis of various DNA tandem repeats. We derive exact expressions for counting the number of repeat unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.
引用
收藏
页码:865 / 889
页数:25
相关论文
共 93 条
  • [1] Trinucleotide repeat expansion and human disease
    Ashley, CT
    Warren, ST
    [J]. ANNUAL REVIEW OF GENETICS, 1995, 29 : 703 - 728
  • [2] Naturally occurring nucleosome positioning signals in human exons and introns
    Baldi, P
    Brunak, S
    Chauvin, Y
    Krogh, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (04) : 503 - 510
  • [3] Baldi P, 1998, Proc Int Conf Intell Syst Mol Biol, V6, P35
  • [4] ON NORMAL APPROXIMATIONS OF DISTRIBUTIONS IN TERMS OF DEPENDENCY GRAPHS
    BALDI, P
    RINOTT, Y
    [J]. ANNALS OF PROBABILITY, 1989, 17 (04) : 1646 - 1650
  • [5] Structural basis for triplet repeat disorders: a computational analysis
    Baldi, P
    Brunak, S
    Chauvin, Y
    Pedersen, AG
    [J]. BIOINFORMATICS, 1999, 15 (11) : 918 - 929
  • [6] Accounting units in DNA
    Bell, SJ
    Forsdyke, DR
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1999, 197 (01) : 51 - 61
  • [7] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [8] A METHOD FOR FAST DATABASE SEARCH FOR ALL K-NUCLEOTIDE REPEATS
    BENSON, G
    WATERMAN, MS
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (22) : 4828 - 4836
  • [9] BLANCHARD MK, 2000, COMPUT CHEM, V24, P57
  • [10] PREDICTING DNA DUPLEX STABILITY FROM THE BASE SEQUENCE
    BRESLAUER, KJ
    FRANK, R
    BLOCKER, H
    MARKY, LA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1986, 83 (11) : 3746 - 3750