AUTOCORRELATION ON WORDS AND ITS APPLICATIONS - ANALYSIS OF SUFFIX TREES BY STRING-RULER APPROACH

被引：37

作者：

JACQUET, P ^{[1
]}

SZPANKOWSKI, W ^{[1
]}

机构：

[1] PURDUE UNIV, DEPT COMP SCI, W LAFAYETTE, IN 47907 USA

来源：

JOURNAL OF COMBINATORIAL THEORY SERIES A | 1994年 / 66卷 / 02期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1016/0097-3165(94)90065-5

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

We study in a probabilistic framework some topics concerning the way words can overlap. Our probabilistic model assumes that a word is a sequence of i.i.d. symbols generated from a finite alphabet. This defines the so-called Bernoulli model. We investigate the length of a subword that can be recopied, that is, a subword that occurs at least twice in a given word. An occurence of such repeated substrings is easy to detect in a digital tree called a suffix tree. The length of a repeated substring corresponds to the typical depth in the associated suffix tree. Our main finding shows that the typical depth in a suffix tree is asymptotically distributed in the same manner as the typical depth in a digital tree that stores independent keys (i.e., independent tries). More precisely, we prove that the typical depth in a suffix tree built from the first n suffixes of a random word is normally distributed with the mean asymptotically becoming 1/h1 log n and the variance alpha . log n, where h1 is the entropy of the alphabet and alpha is a parameter of the underlying probabilistic model. We prove these results using a novel technique called the string-ruler approach. Our results provide new insights into several algorithms on words and data compression schemes. They find direct applications in many facets of science, most notably in molecular biology, coding theory, theory of languages, and design and analysis of algorithms. (C) 1994 Academic Press, Inc.

引用

页码：237 / 269

页数：33

共 31 条

[1]

Aho Alfred V., 1974, DESIGN ANAL COMPUTER

[2] SELF-ALIGNMENTS IN WORDS AND THEIR APPLICATIONS [J].