On the distribution of the number of missing words in random texts

被引:10
作者
Rahmann, S
Rivals, E
机构
[1] Max Planck Inst Mol Genet, Dept Computat Mol Biol, D-14195 Berlin, Germany
[2] CNRS, UMR 5506, LIRMM, F-34392 Montpellier 5, France
关键词
D O I
10.1017/S0963548302005473
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Determining the distribution of the number of empty urns after a number of balls have been thrown randomly into the urns is a classical and well understood problem. We study a generalization: Given a finite alphabet of size sigma and a word length q, what is the distribution of the number X of words (of length q) that do not occur in a random text of length n + q - 1 over the given alphabet? For q = 1, X is the number Y of empty urns with sigma urns and n balls. For q greater than or equal to 2, X is related to the number Y of empty urns with sigma(q) urns and n balls, but the law of X is more complicated because successive words in the text overlap. We show that, perhaps surprisingly, the laws of X and Y are not as different as one might expect, but some problems remain currently open.
引用
收藏
页码:73 / 87
页数:15
相关论文
共 14 条
[1]  
[Anonymous], 1977, URN MODELS THEIR APP
[2]  
[Anonymous], 1994, FDN COMPUTER SCI
[3]   Enhancement of the nucleosomal pattern in sequences of lower complexity [J].
Bolshoy, A ;
Shapiro, K ;
Trifonov, EN ;
Ioshikhes, I .
NUCLEIC ACIDS RESEARCH, 1997, 25 (16) :3248-3254
[4]   On a conjecture by Eriksson concerning overlap in strings [J].
Cakir, I ;
Chryssaphinou, O ;
Månsson, M .
COMBINATORICS PROBABILITY & COMPUTING, 1999, 8 (05) :429-440
[5]   STRING OVERLAPS, PATTERN-MATCHING, AND NON-TRANSITIVE GAMES [J].
GUIBAS, LJ ;
ODLYZKO, AM .
JOURNAL OF COMBINATORIAL THEORY SERIES A, 1981, 30 (02) :183-208
[6]   PERIODS IN STRINGS [J].
GUIBAS, LJ ;
ODLYZKO, AM .
JOURNAL OF COMBINATORIAL THEORY SERIES A, 1981, 30 (01) :19-42
[7]   MAXIMAL PREFIX-SYNCHRONIZED CODES [J].
GUIBAS, LJ ;
ODLYZKO, AM .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1978, 35 (02) :401-418
[8]   MONKEY TESTS FOR RANDOM NUMBER GENERATORS [J].
MARSAGLIA, G ;
ZAMAN, A .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1993, 26 (09) :1-10
[9]  
Rahmann S, 2000, LECT NOTES COMPUT SC, V1848, P375
[10]  
RAHMANN S, 2000, THESIS U HEIDELBERG