FINITE-SAMPLE EFFECTS IN SEQUENCE-ANALYSIS

被引:87
作者
HERZEL, H
SCHMITT, AO
EBELING, W
机构
[1] Institut für Theoretische Physik, Humboldt-Universität zu Berlin, 10115 Berlin
关键词
D O I
10.1016/0960-0779(94)90020-5
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper is devoted to the statistical analysis of symbol sequences, such as Markov strings, DNA sequences, or texts from natural languages. It is shown that entropy calculations are seriously affected by systematic errors due to the finite size of the samples. These difficulties can be dealt with by assuming simple probability distributions underlying the generating process (e.g. equidistribution, power-law distribution, exponential distribution). Analytical expressions for the dominant correction terms are derived and tested.
引用
收藏
页码:97 / 113
页数:17
相关论文
共 29 条
[1]  
[Anonymous], 1991, FRACTALS CHAOS POWER
[2]   PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE [J].
BRUNAK, S ;
ENGELBRECHT, J ;
KNUDSEN, S .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) :49-65
[3]  
CASWELL WE, 1986, DIMENSIONS ENTROPIES
[4]  
EBELING W, 1980, MATH BIOSCI, V52, P53, DOI 10.1016/0025-5564(80)90004-8
[5]   DYNAMICS AND COMPLEXITY OF BIOMOLECULES [J].
EBELING, W ;
FEISTEL, R ;
HERZEL, H .
PHYSICA SCRIPTA, 1987, 35 (05) :761-768
[6]   ENTROPY OF SYMBOLIC SEQUENCES - THE ROLE OF CORRELATIONS [J].
EBELING, W ;
NICOLIS, G .
EUROPHYSICS LETTERS, 1991, 14 (03) :191-196
[7]  
Ebeling W., 1992, Chaos, Solitons and Fractals, V2, P635, DOI 10.1016/0960-0779(92)90058-U
[8]  
EBELING W, 1982, PHYSIK SELBSTORGANIS
[9]  
EBELING W, 1993, P BIOINFORMATIK BONN
[10]   ERGODIC-THEORY OF CHAOS AND STRANGE ATTRACTORS [J].
ECKMANN, JP ;
RUELLE, D .
REVIEWS OF MODERN PHYSICS, 1985, 57 (03) :617-656