Zipf's law is not a consequence of the central limit theorem

被引:27
作者
Troll, G [1 ]
Graben, PB [1 ]
机构
[1] Univ Potsdam, D-14415 Potsdam, Germany
来源
PHYSICAL REVIEW E | 1998年 / 57卷 / 02期
关键词
D O I
10.1103/PhysRevE.57.1347
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
It has been observed that the rank statistics of string frequencies of many symbolic systems (e.g., word frequencies of natural languages) follows Zipf's law in good approximation. We show that, contrary to claims in the literature, Zipf's law cannot be realized by the central limit theorem(s). The observation that a lognormal distribution of string frequencies yields an approximately Zipf-like rank statistics is actually misleading. Indeed, Zipf's law for the rank statistics is strictly equivalent to a power law distribution of frequencies. There are two natural ways to perform the infinite size limit for the vocabulary. The first one is the method of choice in the literature; it makes the upper word length bound tend to infinity and leads in the case of a multistate Bernoulli process via a central limit theorem to a log-normal frequency distribution. An alternative and for text samples actually better realizable way is to make the lower frequency bound tend to zero. This limit procedure leads to a power law distribution and hence to Zipf's law-at least for Bernoulli processes and to a very good approximation for natural languages where it passes the chi(2) test. For the Bernoulli case we will give a heuristic proof. [S1063-651X(98)07102-5].
引用
收藏
页码:1347 / 1355
页数:9
相关论文
共 8 条
[1]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[2]  
BENOIT B, 1961, P S APPL MATH, V12
[3]  
Gut A., 1987, STOPPED RANDOM WALKS
[4]   MARKOV-PROCESSES - LINGUISTICS AND ZIPFS LAW [J].
KANTER, I ;
KESSLER, DA .
PHYSICAL REVIEW LETTERS, 1995, 74 (22) :4559-4562
[5]  
MANDELBROT B, 1954, IEEE T INFORM THEORY, V3, P124
[6]  
Mandelbrot BB., 1983, New York, V1st
[7]   Zipf's law, the central limit theorem, and the random division of the unit interval [J].
Perline, R .
PHYSICAL REVIEW E, 1996, 54 (01) :220-223
[8]  
[No title captured]