String matching and 1d lattice gases

被引:2
作者
Mungan, Muhittin [1 ]
机构
[1] Bogazici Univ, Dept Phys, TR-34342 Istanbul, Turkey
[2] Gursey Inst, TR-34680 Istanbul, Turkey
关键词
pattern occurrences; combinatorics; lattice gases; theory of liquids;
D O I
10.1007/s10955-006-9247-z
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We calculate the probability distribution for the number of occurrences n of a given l letter word x inside a random string of k letters, whose letters have been generated by a known stationary stochastic process. Denoting by p( x) the probability of occurrence of the word, it is well-known that the distribution of occurrences in the asymptotic regime k-->infinity such that kp( x) >> 1 is Gaussian, while in the limit k --> infinity, and p( x) --> 0, such that kp( x) is finite, the distribution is Compound Poisson. It is also known that these limiting forms do not work well in the intermediate regime when kp( x) greater than or similar to 1 and k is finite. We show that the problem of calculating the probability of occurrences is equivalent to determining the configurational partition function of a 1d lattice gas of interacting particles, with the probability distribution given by the n-particle terms of the grand-partition function and the number of particles corresponding to the number of occurrences on the string. Utilizing this equivalence, we obtain the probability distribution from the equation of state of the lattice gas. Our result reproduces rather well the behavior of the distribution in the asymptotic as well as the intermediate regimes. Within the lattice gas description, the asymptotic forms of the distribution naturally emerge as certain low density approximations. Thus our approach which is based on statistical mechanics, also provides an alternative to the usual statistics based treatments employing the central limit and Chen - Stein theorems.
引用
收藏
页码:207 / 242
页数:36
相关论文
共 39 条
[1]   Rigorous location of phase transitions in hard optimization problems [J].
Achlioptas, D ;
Naor, A ;
Peres, Y .
NATURE, 2005, 435 (7043) :759-764
[2]  
[Anonymous], 1963, LECT STAT MECH
[3]  
Barbour AD, 1992, Poisson approximation
[4]   FAST STRING SEARCHING ALGORITHM [J].
BOYER, RS ;
MOORE, JS .
COMMUNICATIONS OF THE ACM, 1977, 20 (10) :762-772
[5]   A LIMIT-THEOREM ON THE NUMBER OF OVERLAPPING APPEARANCES OF A PATTERN IN A SEQUENCE OF INDEPENDENT TRIALS [J].
CHRYSAPHINOU, O ;
PAPASTAVRIDIS, S .
PROBABILITY THEORY AND RELATED FIELDS, 1988, 79 (01) :129-143
[6]   THE OCCURRENCE OF SEQUENCE PATTERNS IN REPEATED DEPENDENT EXPERIMENTS [J].
CHRYSAPHINOU, O ;
PAPASTAVRIDIS, S .
THEORY OF PROBABILITY AND ITS APPLICATIONS, 1990, 35 (01) :145-152
[7]   A LIMIT-THEOREM FOR THE NUMBER OF NON-OVERLAPPING OCCURRENCES OF A PATTERN IN A SEQUENCE OF INDEPENDENT TRIALS [J].
CHRYSSAPHINOU, O ;
PAPASTAVRIDIS, S .
JOURNAL OF APPLIED PROBABILITY, 1988, 25 (02) :428-431
[8]   STRONG LIMIT-THEOREMS OF EMPIRICAL FUNCTIONALS FOR LARGE EXCEEDANCES OF PARTIAL-SUMS OF IID VARIABLES [J].
DEMBO, A ;
KARLIN, S .
ANNALS OF PROBABILITY, 1991, 19 (04) :1737-1755
[9]  
Feller W., 1971, INTRO PROBABILITY TH
[10]  
Fisher I. Z., 1964, STAT THEORY LIQUIDS