An overview on the distribution of word counts in Markov chains

被引:28
作者
Schbath, S [1 ]
机构
[1] INRA, Biometr Unit, F-78352 Jouy En Josas, France
关键词
word count distribution; Markovian random sequence; overlapping occurrences; renewals; clumps;
D O I
10.1089/10665270050081469
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, me give an overview about the different results existing on the statistical distribution of word counts in a Markovian sequence of letters. Results concerning the number of overlapping occurrences, the number of renewals and the number of clumps mill be presented, Counts of single words and also multiple words are considered. Most of the results are approximations as the length of the sequence tends to infinity. We will see that Gaussian approximations switch to (compound) Poisson approximations for rare words, Modeling DNA sequences or proteins by stationary Markov chains, these results can be used to study the statistical frequency of motifs in a given sequence.
引用
收藏
页码:193 / 201
页数:9
相关论文
共 60 条
[1]  
[Anonymous], 1968, An introduction to probability theory and its applications
[2]   2 MOMENTS SUFFICE FOR POISSON APPROXIMATIONS - THE CHEN-STEIN METHOD [J].
ARRATIA, R ;
GOLDSTEIN, L ;
GORDON, L .
ANNALS OF PROBABILITY, 1989, 17 (01) :9-25
[3]   CRITICAL PHENOMENA IN SEQUENCE MATCHING [J].
ARRATIA, R ;
WATERMAN, MS .
ANNALS OF PROBABILITY, 1985, 13 (04) :1236-1249
[4]   THE ERDOS-RENYI STRONG LAW FOR PATTERN-MATCHING WITH A GIVEN PROPORTION OF MISMATCHES [J].
ARRATIA, R ;
WATERMAN, MS .
ANNALS OF PROBABILITY, 1989, 17 (03) :1152-1169
[5]  
Arratia R., 1990, STAT SCI, P403, DOI [10.1214/ss/1177012015, DOI 10.1214/SS/1177012015]
[6]   THE ANALYSIS OF INTRON DATA AND THEIR USE IN THE DETECTION OF SHORT SIGNALS [J].
AVERY, PJ .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 26 (04) :335-340
[7]   THE OCCURRENCE OF SEQUENCE PATTERNS IN ERGODIC MARKOV-CHAINS [J].
BENEVENTO, RV .
STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 1984, 17 (02) :369-373
[8]   ON THE MEAN NUMBER OF RANDOM DIGITS UNTIL A GIVEN SEQUENCE OCCURS [J].
BLOM, G .
JOURNAL OF APPLIED PROBABILITY, 1982, 19 (01) :136-143
[9]   HOW MANY RANDOM DIGITS ARE REQUIRED UNTIL GIVEN SEQUENCES ARE OBTAINED [J].
BLOM, G ;
THORBURN, D .
JOURNAL OF APPLIED PROBABILITY, 1982, 19 (03) :518-531
[10]  
BOUVIER A, 1999, RMES RECHERCHE MOTS