The meta book and size-dependent properties of written language

被引:31
作者
Bernhardsson, Sebastian [1 ]
da Rocha, Luis Enrique Correa [1 ]
Minnhagen, Petter [1 ]
机构
[1] Umea Univ, Dept Phys, S-90187 Umea, Sweden
来源
NEW JOURNAL OF PHYSICS | 2009年 / 11卷
关键词
DISTRIBUTIONS;
D O I
10.1088/1367-2630/11/12/123015
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Evidence is presented for a systematic text-length dependence of the power-law index gamma of a single book. The estimated gamma values are consistent with a monotonic decrease from 2 to 1 with increasing text length. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by gamma = 1 instead of the value gamma = 2 expected if Zipf's law is universally applicable. In addition, we explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.
引用
收藏
页数:15
相关论文
共 16 条
  • [1] [Anonymous], 2003, Internet mathematics, DOI [10.1080/15427951.2004.10129088, DOI 10.1080/15427951.2004.10129088]
  • [2] [Anonymous], 1999, The Origins of Life
  • [3] Baayen R. H., 2001, WORD FREQUENCY DISTR, V18
  • [4] Family name distributions: Master equation approach
    Baek, Seung Ki
    Kiet, Hoang Anh Tuan
    Kim, Beom Jun
    [J]. PHYSICAL REVIEW E, 2007, 76 (04):
  • [5] Size-dependent word frequencies and translational invariance of books
    Bernhardsson, Sebastian
    da Rocha, Luis Enrique Correa
    Minnhagen, Petter
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2010, 389 (02) : 330 - 341
  • [6] Power-Law Distributions in Empirical Data
    Clauset, Aaron
    Shalizi, Cosma Rohilla
    Newman, M. E. J.
    [J]. SIAM REVIEW, 2009, 51 (04) : 661 - 703
  • [7] Ferrer-i-Cancho R., 2001, J QUANT LINGUIST, V8, P165, DOI [DOI 10.1076/JQUL.8.3.165.4101, 10.1076/jqul.8.3.165.4101]
  • [8] Heaps H.S., 1978, Information Retrieval: Computational and Theoretical Aspects
  • [9] Distribution of Korean family names
    Kim, BJ
    Park, SM
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2005, 347 : 683 - 694
  • [10] Mandelbrot B., 1953, INFORM THEORY STAT S