A discrete autoregressive process as a model for short-range correlations in DNA sequences

被引:17
作者
Dehnert, M
Helm, WE
Hütt, MT
机构
[1] Tech Univ Darmstadt, Bioinformat Grp, D-64287 Darmstadt, Germany
[2] Univ Appl Sci, Math & Sci Fac, D-64295 Darmstadt, Germany
关键词
long-range correlation; DNA analysis; entropy; mutual information; Markov process of higher order; discrete autoregressive process;
D O I
10.1016/S0378-4371(03)00399-6
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We present a direct way to model short- and medium-range correlations in DNA sequences and to separate them from long-range correlations. To do so, we discuss symbol sequences generated by a discrete autoregressive process of order p, DAR(p). These sequences display higher-order Markov properties but are based on very few parameters. The aim of our investigation is (1) to introduce with such DAR(p) processes a parameter-efficient tool for generating higher-order Markov processes on a discrete alphabet, (2) to study, how the parameters of the process determine the statistical properties of the sequence and (3) to provide the mathematical tools for estimating the parameters from a given experimental sequence. The statistical properties of the generated sequences, expressed in terms of parameters in the DAR(p) process, are monitored with methods from information theory. The implications of our findings for DNA sequences are discussed and some application is given. In particular, it is shown, how short-range correlations in DNA sequences can be parameterised by such a process. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:535 / 553
页数:19
相关论文
共 38 条
[1]   Long-range correlations between DNA bending sites: Relation to the structure and dynamics of nucleosomes [J].
Audit, B ;
Vaillant, C ;
Arneodo, A ;
d'Aubenton-Carafa, Y ;
Thermes, C .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 316 (04) :903-918
[2]  
Baldi P, 2001, BIOINFORMATICS MACHI
[3]   Finding borders between coding and noncoding DNA regions by an entropic segmentation method [J].
Bernaola-Galván, P ;
Grosse, I ;
Carpena, P ;
Oliver, JL ;
Román-Roldán, R ;
Stanley, HE .
PHYSICAL REVIEW LETTERS, 2000, 85 (06) :1342-1345
[4]   THE MOSAIC GENOME OF WARM-BLOODED VERTEBRATES [J].
BERNARDI, G ;
OLOFSSON, B ;
FILIPSKI, J ;
ZERIAL, M ;
SALINAS, J ;
CUNY, G ;
MEUNIERROTIVAL, M ;
RODIER, F .
SCIENCE, 1985, 228 (4702) :953-958
[5]   HIDDEN MARKOV-CHAINS AND THE ANALYSIS OF GENOME STRUCTURE [J].
CHURCHILL, GA .
COMPUTERS & CHEMISTRY, 1992, 16 (02) :107-115
[6]   Statistical mechanics of protein sequences [J].
Dewey, TG .
PHYSICAL REVIEW E, 1999, 60 (04) :4652-4658
[7]   Entropy and extended memory in discrete chaotic dynamics [J].
Ebeling, W ;
Freund, J ;
Rateitschak, K .
INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 1996, 6 (04) :611-625
[8]  
Ebeling W., 1998, Komplexe Strukturen: Entropie und Information
[9]  
FREUND J, 1997, STOCHASTIC DYNAMICS
[10]  
GATLIN LL, 1972, INFORMATION THEORY L