On pattern frequency occurrences in a Markovian sequence

被引:74
作者
Regnier, M
Szpankowski, W
机构
[1] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
关键词
frequency of pattern occurrences; Markov source; autocorrelation polynomials; languages; generating functions; asymptotic analysis; large deviations;
D O I
10.1007/PL00009244
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Consider a given pattern H and a random text T generated by a Markovian source. We study the frequency of pattern occurrences in a random text when overlapping copies of the pattern are counted separately. We present exact and asymptotic formulae for moments (including the variance), and probability of r pattern occurrences for three different regions of r, namely: (i) r = O(1), (ii) central limit regime, and (iii) large deviations regime. In order to derive these results, we first construct certain language expressions that characterize pattern occurrences which are later translated into generating functions. We then use analytical methods to extract asymptotic behaviors of the pattern frequency from the generating functions. These findings are of particular interest to molecular biology problems (e.g., finding patterns with unexpectedly high or low frequencies, and gene recognition), information theory (e.g., second-order properties of the relative frequency), and pattern matching algorithms (e.g., q-gram algorithms).
引用
收藏
页码:631 / 649
页数:19
相关论文
共 42 条
[1]  
[Anonymous], P 31 ANN MARSCH IT S
[2]  
[Anonymous], 1990, MATH ANAL ALGORITHMS
[3]  
[Anonymous], 1973, J COMB THEORY A
[4]   RENEWAL THEORY FOR SEVERAL PATTERNS [J].
BREEN, S ;
WATERMAN, MS ;
ZHANG, N .
JOURNAL OF APPLIED PROBABILITY, 1985, 22 (01) :228-234
[5]   A CONTRIBUTION TO THE THEORY OF CHERNOFF BOUNDS [J].
BUCKLEW, JA ;
SADOWSKY, JS .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1993, 39 (01) :249-254
[6]  
CHRYSAPHINOU C, 1990, THEOR PROBAB APPL, V79, P167
[7]  
COWARD E, 1997, P MATH ANAL BIOL SEQ
[8]  
CROCHEMORE M, 1995, TEXT ALGORITHMS
[9]  
CSISZAR I, 1981, INFORMATION THEORY C
[10]   LARGE DEVIATIONS FOR A GENERAL-CLASS OF RANDOM VECTORS [J].
ELLIS, RS .
ANNALS OF PROBABILITY, 1984, 12 (01) :1-12